Journal of Statistical Physics manuscript No. 

(will be inserted by the editor) 



(N 

O 
(N 

o 



X: 



Avik Haider ■ Ansuman Adhikary 

Statistical Physics Analysis of Maximum a Posteriori 
Estimation for Multi-channel Hidden Markov Models 



Received: date / Accepted: date 



Abstract The performance of Maximum a posteriori (MAP) estimation is studied analytically for binary sym- 
metric multi-channel Hidden Markov processes. We reduce the estimation problem to a ID Ising spin model 
Qj ■ and define order parameters that correspond to different characteristics of the MAP-estimated sequence. The 

solution to the MAP estimation problem has different operational regimes separated by first order phase tran- 
sitions. The transition points for L-channel system with identical noise levels, are uniquely determined by L 
being odd or even, irrespective of the actual number of channels. We demonstrate that for lower noise inten- 
sities, the number of solutions is uniquely determined for odd L, whereas for even L there are exponentially 
■ many solutions. We also develop a semi analytical approach to calculate the estimation error without resorting 

. to brute force simulations. Finally, we examine the tradeoff between a system with single low-noise channel 

and one with multiple noisy channels. 
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\^ . 1 Introduction 

(N- In many dynamical systems direct observation of the internal system state is not possible. Instead, one has 

noisy observations of those states. When the internal dynamics is governed by a Markovian process, the 

■ resulting systems are known as Hidden Markov Models, or HMMs. HMM is the simplest tool to model 
the systems where a correlated data passes through a noisy channel |IT]|2l. It is used extensively to model 

C^l ■ such physical systems and finds applications in various areas such as signal processing, speech recognition, 

bioinformatics and so on |I3]I3- 

One of the major problems underlying HMMs is the estimation of the hidden state sequence given the 
observations, which is usually done through maximum a posteriori (MAP) estimation technique. A natural 
^ . characteristic of MAP estimation is the accuracy of estimation, i.e., the closeness of the original and estimated 

■ sequence. Another important characteristic is the number of solutions it produces in response to a given 
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observed sequence. This is related to the concept of trackability, which describes whether the number of 
hidden state sequences grows polynomially (trackable) or exponentially (non-trackable) with the length of 
the observed sequence. For so called weak models^ it was established that there is a sharp transition between 
trackable and non-trackable regimes as one varies the noise \5\. A generalization to the probabilistic case was 
done for a binary symmetric HMM fS], where it was shown that the non-trackability is related to finite zero- 
temperature entropy of an appropriately defined Ising model. In particular, [6] demonstrated that there is a 
critical noise intensity above which the number of MAP solutions is exponential in the length of the observed 
sequence. 

In this work we study the performance of MAP estimation for the scenario where the hidden dynamics is 
observed via multiple observation channels, which we refer to as multi-channel HMMs. Multi channel HMM 
is a more robust way of modeling multi channel signals, such as in sensor networks where local readings 
from different sensors are used to make inference about the underlying process. Note that the channels can 
generally have different characteristics, i.e., measurement noise. Here we assume that the readings of the 
channels are conditionally independent given the underlying hidden state. 

One of our main results is that the presence of additional observation channels does not always improve 
the inference in the sense of above characteristics. In particular, in systems with even number of channels and 
identical noise intensities, there is always an exponential number of solutions for any noise intensity, indicated 
by a finite zero-temperature entropy of the corresponding statistical physics systems. Intuitively, this happens 
because of conflicting observations from different channels, which produces a macroscopic frustration of 
spins in the system. Furthermore, for two-channel systems with generally different noise characteristics, we 
calculate the phase boundary between the regions of zero and non-zero entropy regimes in the parameter 
space. In all the systems studied with single or multiple channels, we observe discontinuous phase transitions 
in the thermodynamic quantities with the variation of noise. The points of phase transitions are the same for 
all the even number channel systems. This is also true for any of the odd number channel systems but the 
transition points are different from that of the even number channel systems. 

For general L-channel systems with identical noise in the channels, we calculate the different statistical 
characteristics of MAP for this scenario, and find that the average error between the true and estimated se- 
quences reduces with the addition of channels. Furthermore, we bring in the notion of channel cost which lets 
us explore the tradeoff between the total cost and error that one can tolerate in the system. Finally, we also an- 
alyze the performance of MAP estimation for Gaussian noise, and find that the entropy is always zero, so that 
one has exactly one solution for every observed sequence. This suggests that the existence of exponentially 
degenerate solution relates to the discreteness. 

Let us provide a brief outline of the structure of our paper: We start by providing some general information 
about Maximum a Posteriori estimation in Section |2] We introduce binary symmetric HMMs and describe 
the mapping to an Ising model in Section [3] In Section 13.31 we describe the statistical physics approach to 
MAP estimation for multi-channel binary HMMs and define appropriate order parameters for characterizing 
MAP performance. Section |4] describes the solution of the model. The recurrent states are given in Section |5] 
followed by presentation of results in Section|6] Section|6T|focuses on a detailed analysis for two observation 
channel scenario and Section l6^ provides results for the multiple channel scenario. In Section l63] we discuss 
about the cost of designing a multiple channel system and its impact on the system performance relative to a 
single channel system. Finally, we conclude by discussing our results and future work. In the appendix, we 
give details about analytical calculation of MAP accuracy and elaborate on the Gaussian observation model 
along with the main differences from the binary symmetric case. 



2 MAP Estimation 

The present work focuses on a specific class of stochastic processes, namely, the binary symmetric HMMs 
although the techniques of MAP estimation can be applied to general stochastic processes too. In this section, 
we give a brief idea on MAP estimation for generalized HMMs and defer to the study of binary symmetric 
HMMs in Section [3 

Let us consider x = (xi , . . . ,xyv) as the signal generating sequence. The observation in L different channels 
at every time instant can be generically written as y = (y \ . . . ,y^) with y' = (y'j, . . • jyjv)!!'^!, -- ^}- Here x and 



Weak Models can be described as a simplification of HMMs where one specifies possible transitions and observations from 
a given state without assigning probabilities. 
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Fig. 1 (Color online) Hidden Markov model with multiple observation sequences 

y are the realizations of discrete time random processes 5C and '3/' ; with / being the index for the observa- 
tion channel as shown in Figure [l] y''s are the noisy observations of obtained from different observation 
channels. The observations from different channels are mutually independent and are described by the condi- 
tional probability p(y' |x). The observed sequences y''s are considered to be given along with the probabilities 
p(y'|x) and p(x). Thus, we are required to find the generating sequence x from the given sets of observation 
sequences. 

MAP provides a method to estimate the generating sequence x on the basis of the observations y'. x is 
found by maximizing over x the posterior probability, 

p(x|y\...,y^)=p(y\...,y^|x)p(x)/{ £ p(y\...,y^)} 
Since p(y \ . . • ,y^) does not depend on x, we can equivalently minimize the Hamiltonian which is given 

by, 

/7(y,x) = -log[p(y',...,y^|x)p(x)] (1) 

where by log we imply natural logarithm. Because of the mutual independence between the observations 
conditioned on x, we can rewrite //(y, x) as 

/f(y,x) = ~log[p(x)] - £log[p(y'|x)] (2) 

(=1 

The advantage of using H{y,x) is that, if ^ is ergodicQ (which we will assume in the rest of the paper), then 
for >> 1, //(y,x(y)) will be independent from y, Vy € i2yv(^), ^^Ni?^) being the typical set of 
The typical set is the set of sequences whose sample entropy is close to the true entropy, where entropy is a 
measure of uncertainty of a random variable and is a function of the probability distribution of the sequences 
in '3^. The typical set has total probability close to one, a consequence of the asymptotic equipartition property 
(AEP) Q. Thus for A/^ >> l,the sequence (y\ ... ,y^) will lie in the typical set of with high probability. As 



- Ergodicity implies time average is equal to ensemble average, i.e., an ergodic process has the same behavior averaged over 
time as averaged over the space of all its states. 
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a result, we have Y, P(y) ^ 1 ^nd all elements of £2n (S?^) have equal probability. We can take advantage 

of this and use for //(y,x(y)) the averaged quantity Y.P{y)H{y,x{y)). 

_ _ ^ _ _ _ 

Let us consider some extreme cases of noise in the observation channels. Since we are dealing with more 
than one observation channel we have to consider the overall contribution from different channels with vary- 
ing noise. When we refer to the weak and strong noise in the channels we will be referring with respect to the 
noisiest channel of the whole set. The other channels will be relatively cleaner and modify the overall perfor- 
mance. For the case (/) when the noise applied to the channels is very weak, p(x|y) = nf=i 11^=1 ^i^k ^y'k)- 
This results in recovering the generating sequence almost exactly and (//) for the case of strong noise in 
channels the estimation is prior dominated i.e. p(x|y) = p(x), and hence is not informative. Without any 
prior imposed, p(x) oc const.. Here, the MAP estimation reduces to the maximum likelihood (ML) estimation 
scheme. It should be noted that using ML estimate we can also obtain the exact sequence in the low noise 
regime. 

Minimization of Hamiltonian from ([T]) for a given y can be readily done using the Viterbi algorithm, but it 
produces one single optimal estimate x(y). For completeness we should also seek for other possible sequences 
x^^'l (y) for which //(y,xt''l (y)) ( greater than //(y,x(y))) is almost equal to the optimal estimated Hamiltonian 

H(yxi^Uy)) Hiyx(y)) 

in large N limit. Under that approximation we have lim/v-^Do — -jj = limAf_j.oo — 'j^ — . These obtained 

sequences from the MAP estimate are equivalent for A' — > oo and can be listed as, 

xW(y),7=l,...,^(y) (3) 

with -yK(y) denoting the number of such possible sequences. If log^(y) oc N, the ergodicity argument can 
be used to obtain the logarithm of the number of solutions for the observed typical sequence, 

= lp(y)iog^(y) (4) 

y 

In the limit finite value of 9 = ^ implies that there are exponentially many outcomes of minimizing 

H{y,x) over x. The term is called entropy from analogy with the Ising spin model as will be explained in 
detail in the subsequent sections. 

Various moments of xl^I (y) are calculated, which are random variables due to the dependence on y. The 
knowledge of the moments along with the error analysis is employed to characterize the accuracy of the 
estimation. For weak noise these moments are close to the original process We also evaluate the average 
overlap between estimated sequences xt''! (y) and the observed sequences y. If the overlap is close to one, it 
would imply a observation dominated estimation of the sequence. 



3 Binary symmetric Hidden Markov Processes 

3.1 Definition 

We analyze a binary, discrete-time Markov stochastic process ^ = {xi,X2, ■ ■ ■ ,xn). Each random variable xjc 
can have only two realizations x^ = ± 1 . The Markov feature implies, 

N 

P(x) = Y[pixk\xk-i)p{xi) (5) 

k=2 

where p(xji\xji-i) is the time-independent transition probability of the Markov process. The state diagram 
for the binary, discrete time Markov process is shown in Figure |2] We parameterize the binary symmetric 
situation by a single number < q < I, with /?(1|1) = p{—i\ ~ 1) = 1 and p{l\ — 1) = /?(— 1|1) = q 
and the stationary state distribution is considered to be uniform /?st(l) = PA{ — i) = j- The noise process is 
assumed to be memory-less, time independent and unbiased. Thus, 

p(yix)=nn^'wk)> ^[=±1 (6) 

/=1A:=1 
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where n' {y'^. ) is the probability of observing y^. in channel ; given the state a> . For the channel tt' ( — 1 1 1 ) = 
7r'(l| — 1) = £,, 7r'(l|l) = 7r'( — 1| — 1) = 1 — e,; e, being the probability of error in the channel. Here 
memory-less refers to the factorization in time independence refers to the fact that 7r'(. . . | . . .) does not 
depend on k, and finally unbiased means that the noise acts symmetrically on both realizations of the Markov 
process, i.e., 7r'(l| — l) = 7r'(— 1|1). Starting from this, we vary the noise between multiple observation chan- 
nels and study its effect in the sequence decoding process. In Appendix 18.21 we discuss a more general case 
involving Gaussian distribution of noise realizations in the observation channels. The detailed formalism is 
done for the case of a single observation channel. 

The composite process J^'S^ with realizations {y'f,,xic) is Markov even though 3^ in general is not a Markov 
process. The transition probabilities for the «'* observation channel can be written as, 

Piyk+vXk+i\yk,Xk) = 7t'{y'i,^i\xk+i)p{xk+i\xk) (7) 



1 - Q a Si 
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Fig. 2 (Color online) State Diagram for a Binary symmetric Markov chain 



3.2 Mapping to the Ising model 

The problem can be efficiently mapped to the Ising spin model where we will represent the transition proba 
bilities as. 



gJVk-l 1 

The observation probabilities for binary symmetric noise are given as. 



. . e'''>'W 1 



l-e, 



and that for the Gaussian distribution of noise in the observation channels is given by. 



T^\y'kW) = 



InOi 



(8) 



(9) 



(10) 



where of is the noise variance. 

Combining (|9]l with ^ and with the help of (|5]) and ^ we represent the log-likelihood for the case with 
binary symmetric noise realization as. 



N L N 

HB{y,x) = -jY, ^kXk~i - L y'k^''- 

k=2 i=l k=l 

The same for the Gaussian distribution of noise (see ( flOl )) is given as. 



which simply reduces to. 



N L 2 W 

^G(y,x) = -JY, ^kXk-l - £ ir-j £ {yi - Xk) 
k=2 i=l k=l 



N L ^ N 

k=2 i=\ k=\ 



(11) 



(12) 



(13) 
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The redundant additive factors are omitted from final expressions of the Hamiltonian. The subscripts 'B' 
and 'G' denote the binary and Gaussian cases respectively. H{y,x) (representing the general Hamiltonian 
for both 'B' and 'G') is the Hamiltonian of a (Id) Ising model with external random fields governed by the 
probability p(y|x) [jSj. The factor J is the spin-spin interaction constant, uniquely determined by the transition 

probability q. For q < j, the constant J is positive. This refers to the ferromagnetic situation where the spins 
tend to align in the same direction. The random-fields (within the Ising model) obtained in the expression for 
H{y,x) display a non-Markovian correlation. This is different from what is known for the random fields from 
the literature, that are in general uncorrelated 191I. I110L For all calculations, we assume a, > 0. Let us 
also introduce another parameter a = h^/hi which represents the ratio of the noise between two observation 
channels (subscripts 1 and 2 refer to the two observation channels) and is assumed to be a positive integer. 



3.3 Statistical Physics of MAP estimation 

We now implement the MAP estimation to minimize the Hamiltonian H{y,x). In Section |2] we argued that 
this is equivalent to minimizing Y,P{y)H {y ,x.{y)) . For this purpose let us introduce a non-zero temperature 

I ~ ~ ~ 
T = ^ >0 and write the conditional probability as, 

-j3//(y,x) 

p(x|y)^^^; Ziy)^l^e-P"<~l^^ (14) 

where Z(y) is the partition function. Using ideas from statistical physics, we find that p(x|y) gives the proba- 
bility distribution of states x for a system with Hamiltonian H{y,x). The system is assumed to be interacting 
with a thermal bath at temperature T, and with frozen random fields /i,v^Jl 1). For T — !> 0, and a given y, the 
individual terms of the partition function are strongly picked at those x(y) which minimize the Hamiltonian 
H{y,x) to get to the ground states. If, however, the limit T — !> is applied after the limit A' — !> oo we get, 

where x'^1 (y) and c/K(y) are given by Oil. We are going to work on this low temperature regime from now on. 
The average of H{y,x) in the T — > regime will equal its value minimized over x, 

£p(y)p(x|y)//(y,x) =£p(y)/f(y,xW(y)) =/f(y,xW(y)) (16) 
xy y 

where by assumption all ground state configurations x(y) have the same energy H{{y,x\'^^ (y)) = H{y,x\^^ (y)) 
for any 7. 

The zero-temperature entropy depicting the number of MAP solutions and is given as, 

= -Lp(y)p(x|y)logp(x|y) = £p(y)log^(y) (17) 
xy y 

Another important statistical parameter, free energy can also be defined as, 

F{JAT) = -r£p(y)log£e-^^(l''') (18) 
y X 

where the Ising Hamiltonian H{y,x) is given by (fTTT l or ( [13] ). With the help of this definition we can now 
define the entropy in terms of the free energy as, 

= -dTF\T^O (19) 

We also define the order parameters which are the relevant characteristics of MAP below, 

1 1 

c = Lp(y)P(x|y)T7 L Xkx,+, = -djF (20) 
xy k=\ '-^ 
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, L N 1 

V = Ip(y)p(x|y)- £ £ = t;^/>^ (21) 

c accounts for the correlation between the neighboring spins in the estimated sequence and v measures the 
overlap between the estimated and the observed sequences for the various cases that are studied. 

The order parameters defined above are indirect measures of estimation accuracy. Calculation of direct 
error involves evaluating the overlap between the actual hidden sequence that generates a given observation 
sequence, and the inferred sequence based on the observation sequence. Since we are interested in average 
quantities, it is clear that calculating the average error amounts to finding the overlap between a typical hidden 
sequence and its MAP estimate, i.e., 

N 

4 = X 'kXk (22) 

k=l 

where s = is a typical sequence generated by the (hidden) Markov chain. Let us define a modified 

Hamiltonian, 

N L N N 

Hg (y, x; s) = -/ ^ XkXi,_ 1 - L y'k^k -gj^ ^kXk (23) 

k=2 i=l k=l k=l 

Further, let us introduce fg which is related to the free energy Fg of the modified Hamiltonian hy fg = -^.IX. 
is given by, 

= -Tll^V{l,^)\ogl^e-^"^(l^'^^ (24) 
With this the overlap can be simply given by 4 = —dgfg\g^Qp^^, where the limits are taken in the order 



4 Solution of the Model 

Here we will solve for the order parameters and entropy of multiple observation channels using tools that we 
developed using the Ising spin model. The solution of the model for L = \ was provided in 0. To make the 
paper self-contained, here we repeat the main steps of the derivation. Let us recall the partition function (114b 
which for L observation sequences can be written as, 

iV-l L N 

Ziy) = L e *=i .-1 (25) 

A'l— ±l,...,A'A' — ±1 

Summing over the first spin yields the following transformation for Z(y), 

„ j3/l'.v,+i.v,+/3 E A; E >{,.vt PjY xk+,x,+p t h, z y;xk+p^2X2+mii) 

e k=i i=i k=i = 2I1 e '=' (26) 

Jl,...,xjv X2,...,XN 
L . L , 

where ^2 = + I ^1 = I hiy\ and, 

!=1 i=l 

1 cosh[/37 + /3M] 

Mu) = x^log TT^ — ^ (27) 

2p cosh[p7-pMj 

B{u) = -^log(4cosh[j37 + /3«]cosh[/37-j3M]) (28) 
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L 

Hence, once the first spin is excluded from the chain the field acting on the second spin changes from ^ hiy'2 

i=\ 

L 

to Xi hijj In the zero temperature limit (T — > ; j3 — > °o) the functions A(a) and B{u) reduce to the 

i=l 

following form, 

A{u) = in»{J -\u\)+J^{u-J)-J^{-u-J) (29) 

B{u) = J-&{J - \u\) + in}{u -J)- w&i-u - J) (30) 

where =0 for {x < 0) and t?(x) = 1 for {x > 0). Thus after excluding one spin every time and repeating 
the above steps, the partition function for the system is obtained as, 

/J E m) 

Z{l)=e (31) 
where is obtained from the recursion relation 

^k = t,hiy[+A{^k~i).k=\,2,...,N, ^o=0 (32) 

Following similar steps with Hq from ( fT3T i we get the recursion relation for the Gaussian realization of noise 
in the observation channel as, 

U = t,\y'k+A{^k-i).k=\a,...,N, ^0=0 (33) 
i=\ 

j^'s from the above random recursion relation are random quantities governed by the probability p(y). As we 
can see from the above equations for binary realizations, can take a finite number of values and £,1^ can take 
an infinite number of values. But in the asymptotic j5 ^ °° limit, A{u) and B{u) reduce to the simple form 
given in ( 129b and ( 130b respectively. Thus we obtain a finite number of values for (the number can be large 
or small). However for the Gaussian distribution, we get a continuous set of values for ^f^. 

The parametric form of is already explained in for a single observation channel. Here we provide the 
generalization to two observation channels, i.e. L = 2, which can be easily extended to the multiple channel 
case. We can parameterize as, (^j.(ni,«2,«3) = {nihi +«2fe +«3-/) = + a«2]/Ji +"3-/), where hj/hi — 
a, «i and n2 can be positive and negative integers while «3 can take only three values 0,±1. It may be noted 
that the states ^ («i ,n2j0) are not recurrent: once takes a value with + a«2) positive or negative,n3 = 
±1] it never comes back to the state ^ («i ,n2,0). Thus in the limit >> 1 we can ignore the states ^ («i,«2,0). 

Let us now consider the problem of finding the stationary distribution of the random process given by ( 132b . 
Note that the process 3^ with probabilities p(y) is not Markovian, hence the process in ( 132b is not Markovian 
either, which slightly complicates the calculation of its stationary distribution. Towards the latter goal, let 
us consider an auxiliary Markov process iF, which has identical statistical characteristics with the process 
For this auxiliary Markov process 3f the realization is denoted as z, so as not to mix with the original 
process x. We need to include ^ to make the composite process Markov. Thus, to make the realizations for 
the composite process ,y' , . . . ,y^] Markov, we enlarge it to [1^ ,y' , . . . ,y^,z] (lets call it 'rf). The conditional 
probability for is given by, 

(»(^y,...,/,^lry^...y^^o=M^k09(^l^^.v^•■■y)^^'(yk) (34) 

i=l 

For p{z\z') and n{y' \z) refer to the Markov process ^ and the noise respectively, while \^',y\... ,y^) 
takes only two values and 1, depending on whether the transition is allowed by the recursion ( 132b . Now, we 
need to determine (p{^ , ■ ■ ■ ,y^) after finding all possible values of ^i^. Let us first relate the stationary 
probabilities (w(| ,_y', . . . ,y^,z) of the composite Markov process 'rf to the characteristics of MAP estimation: 
ft)(^ ,y' , . . . ,y^,z), conveys the message about the stationary probabilities ). After this using the definition 
for the partition function from ( 1311) and free energy ( 1181) for the composite Markov process which is ergodic, 
the free energy is given as |[9l, 

-fiJ,h) = ~F{J,h)/N = Y,(0{^)Bi^) (35) 
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where the summation is taken over all possible values of ^ [for a given {J,h)]. We study different processes 
in the multi-channel scenario, generally speaking, for L > 2 with same noise in the channels and for L = 2 
with varying noise in the two channels. For all these cases /t,'s can be represented by a single quantity h. Once 
we find f{J,h) we can apply (I20b.( l21b to do further calculations. To calculate the entropy we first derive a 
convenient expression for free energy, which can be obtained from ( 1301 ), ( 131b . 

^(y) = - 7 I I log[2cosh{j3(^,+.7)}] (36) 

Using this we find, 

dTF{y)\T^o = i^^r |r f ±/)| Ir^o (37) 
where 5 ( . ) is the Kronecker delta function. In the > > 1 limit, under the assumption that the Markov process 

N 

is ergodic, jj £ 5 (^j: ± /) should with probability one (for the elements of the typical set 12 (^^)) converge to 

k=l 

co{E, =J) + (0{£, = -J). We thus obtain |I9|, 

e^0/N=^^[o){J) + O){-J)] (38) 

The above formula for an Ising spin model would imply that the zero temperature entropy can be extensive 
only when the external field ^ acting on the spins would have the same energy i^xj = ±1 as the spin-spin 
coupling constant J. This would imply that a macroscopic amount of spin (from any of the models studied) is 
frustrated, i.e., the factors influencing those spins compensate each other. When the entropy is not zero, there 
are many sequences whose probability may still slightly differ from one another. The MAP characteristics 
(c, v) would refer to the averages over all those equivalent sequences. We will discuss this effect explicitly for 
the various cases analyzed. 

Finally, we discuss the semi-analytical error analysis that has been developed in this work. For this let us 
consider the overlap defined in (122b . Recall that the error estimate is given by — f?^/g|i,^o./3^~ where fg is 
defined in (1241) . Derivation is shown explicitly for a single observation channel in Appendix 18. II The results 
are used to analytically solve for the error in single and two observation channels and is provided in Section 
16.11 The formalism can easily be generalized for multiple observation channels with some tedious algebra. In 
Appendix l8.1l we show that the overlap is be given as, 

4. = Ip(y,s).,[/(^,) +8{^k)m+i) + . . .+8i^k) ■ ■ .g{^N-i)f{^N)] (39) 
y.s 

In the limit — > Aj^ is independent of k. The average error is related to the overlap as, PMAp{srror} = 

l-A 
2 ■ 



5 Characterization of the recurrent states 

For calculating the quantities of interest we need to obtain the recurrent states for the different multi-channel 
systems. The recurrent states are found out and parameterized in a manner similar to that demonstrated for 
L = 1 in 161. In this section we analyze two scenarios. In Scenario I, we consider a 2 observation channel 
system. The noise in either channel is varied in a certain fixed proportion. In Scenario II, we consider the 
general L > 2 channel system, each having the same noise intensity. Below we give the parameterizations 
of the recurrent states for these two cases separately. In both the scenarios we come across multiple phase 
transitions which we denote by m and the noise within any of those phases varies as ^ <h < -^z^- The phase 
transitions are reflected in the study of the order parameters and the error analysis and is discussed in detail 
in the next section. 
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5.1 Scenario I: Two channels with different noise intensities 

The parameter a = h2/h\ is defined as the ratio between /t,'s (|9| in the individual observation channels. The 
stationary states can be parameterized as, [a, a,] which are written in terms of /; (depicting the channel with 
higher noise), 



at = J + {2-i)h + rih 
di = -/ - (2 - /)/;. + T]/j 



(40) 



with 7] e {{a + l)h,{a - l)h, -{a - l)h,-{a + l)h}. 

The values of a = 112/ hi are kept as positive integers since for an arbitrary a (rational or irrational) 
the number of stationary states increases very fast making the analysis complicated, but doesn't contribute 
towards any additional generality of the problem. The stationary states at all the phases can be found form the 
above formula by substituting the values of from Table [T| in the above equation. The total number of states 
during different phases is tabulated in Table |2l 



Table 1 The possible values of / for different a at different phases denoted by m. In the table below (. 
all integer values on the range and (ii) a : odd, only the even values in the range. 



.) implies for (i) a : even. 



m 


a = 1 


a = 


2 


a = 3 


a = 4 


a = 


5 


a = 6 


a = 7 


a = 8 


a = 9 


a = 


1 


2 


2 




2 


2 


2 




2 


2 


2 


2 


2 


2 


2 


2,3 




2 


2 


2 




2 


2 


2 


2 


2 


3 


2,4 


2,.. 


,4 


2,4 


2 


2 




2 


2 


2 


2 


2 


4 


2,4 


2,.. 


,5 


2,4 


2,5 


2 




2 


2 


2 


2 


2 


5 


2,4,6 


2,.. 


,6 


2,4,6 


2,5 


2,6 




2 


2 


2 


2 


2 


6 


2,4,6 


2,.. 


,7 


2,4,6 


2,4,5,7 


2,6 




2,7 


2 


2 


2 


2 


7 


2,..., 8 


2,.. 


,8 


2,..., 8 


2,..., 8 


2,.. 


,8 


2,7 


2,8 


2 


2 


2 


8 


2,..., 8 


2,.. 


,9 


2,..., 8 


2,..., 9 


2,.. 


,8 


2,4,7,9 


2,8 


2,9 


2 


2 


9 


2,..., 10 


2,.. 


,10 


2,..., 10 


2,..., 10 


2,.. 


,10 


2,4,7,9 


2,4,6,8 


2,9 


2,10 


2 


10 


2,. ..,10 


2,.. 


,11 


2,..., 10 


2,. ..,11 


2,.. 


,10 


2,4,6,7,9,11 


2,4,6,8 


2,4,9,11 


2,10 


2,11 



Table 2 Number of states for different a at different phases denoted by m 



a=l a=2 a=3 a=4 a=5 a=6 a=l a=\ 



a = 9 a =10 



1 

2 


00 00 


8 

16 


8 
8 


8 
8 


8 
8 


8 
8 


8 
8 


8 
8 


8 
8 


8 
8 


3 
4 


16 
16 


24 
32 


16 
16 


8 

16 


00 00 


CO CO 


CO OC 


00 00 


00 00 


00 00 


5 


24 


40 


24 


16 


16 


8 


8 


8 


8 


8 


6 


24 


48 


24 


32 


16 


16 


8 


8 


8 


8 


7 


32 


56 


32 


56 


32 


16 


16 


8 


8 


8 


8 


32 


64 


32 


64 


32 


32 


16 


16 


8 


8 


9 


40 


72 


40 


72 


40 


32 


32 


16 


16 


8 


10 


40 


80 


40 


80 


40 


48 


32 


32 


16 


16 
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5.2 Scenario II: L— identical channels 

For a multiple observation channel system we can parameterize the stationary states for a given value of J 
and h within any phase as, 

L 

bi = J+{2-i)h + Y^tih 

i=l 

L 

bi = ~J~{2~i)h + Y,tih (41) 
1=1 

with f, G {+1,-1}. The possible values of i for different phases are given in Table [3j For the various 
possible phases, the total number of states are given in Table|4] Here one can notice the significant rise in the 
total number of states for large L as we go to a higher phase. 



Table 3 The possible values of / for multiple observation channel systems with same h at different phases denoted by m. In the 
table below {. . .) implies for (i) a : odd, all integer values on the range and (ii) a : even, only the even values in the range. 





m 


L :: even 


L:: 


odd 


1 


2 


2 




2 


2 


2,3 




3 


2,4 


2,. 


.,4 


4 


2,4 


2,. 


.,5 


5 


2,4,6 


2,. 


.,6 


6 


2,4,6 


2,. 


■ J 


7 


2,..., 8 


2,. 


.,8 


8 


2,..., 8 


2,. 


■ ,9 


9 


2,..., 10 


2,. 


.,10 


10 


2,. ..,10 


2,. 


.,11 



Table 4 Number of states for multi observation channels with same h at different phases denoted by m 



m 


L= 1 


L = 2 


L = 3 


L = 4 


L = 5 


L = 6 


L = l 


L = 8 


1 


4 


8 


16 


32 


64 


128 


256 


512 


2 


8 


8 


32 


32 


128 


128 


512 


512 


3 


12 


16 


48 


64 


192 


256 


768 


1024 


4 


16 


16 


64 


64 


256 


256 


1024 


1024 


5 


20 


24 


80 


96 


320 


384 


1280 


1536 


6 


24 


24 


96 


96 


384 


384 


1536 


1536 


7 


28 


32 


112 


128 


448 


512 


1792 


2048 


8 


32 


32 


128 


128 


512 


512 


2048 


2048 


9 


36 


40 


144 


160 


576 


640 


2304 


2560 


10 


40 


40 


160 


160 


640 


640 


2560 


2560 



6 Results 

In this section we describe our results where we try to analyze different multiple observation channel systems 
by evaluating the order parameters and entropy. First, we consider a 2 observation channel system by varying 
the noise in the individual channels. Next, we consider L observation channels with identical noise intensi- 
ties. Finally, we compare the performance of a single relatively "clean" channel with multiple "noisy" ones 
by bringing in the notion of channel cost, and examine the tradeoffs between the two setups. An attempt is 
made to provide a physical intuition behind the results that we obtained using simple thermodynamic prin- 
ciples ll9l lI2|[T3]|I4III5l . The system performance is considered for both maximum likelihood (ML) as well 
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0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 



(a) c (b) V (c) VI 

Fig. 3 (Color online) Order parameters (a) c and (b) v obtained analytically by Ising Hamiltonian minimization (bold line) which 
shows an exact superposition with the data obtained from the Viterbi algorithm (open squares) for two channels with different 
noise for q = 0.24. The smooth decaying lines in the plot for c are obtained via ML estimate and the horizontal blue line depicts 
co.(c)v'i is the overlap between the MAP and ML estimated sequences. 

as maximum a posteriori (MAP) estimation. In ML, the estimation of the current state is dependent only on 
the observations from the different channels at that particular instant. In case of a tie, which arises when the 
number of observation channels is even, the state is chosen randomly to be either or 1 with equal probability. 
For example, suppose the number of observation channels is L (L being even) with L/2 of the channels having 
an observation 1 and the remaining L/2 an observation of 0. In this case, the state is chosen to be either or 
I at random with probability 1/2. 



6.1 Two-Channel scenario 

In this subsection, we study the performance of a two observation channel system with relatively different 
noise levels. 

Channel 1 has noise e\ and channel 2 has noise Ej. The parameter a defines the relation between the noise 
levels in the two channels. Specifically, 



1 




Since we always take a > 10 the noise level in channel 2 is less than that in channel 1, implying channel 1 
is "noisier" than channel 2. The order parameters c and v, plotted by varying the noise ejj in channel 1 for 
a fixed spin-spin correlation / (a function of q as given by ([Hi), are shown in Figure [3j We observe different 
operational regimes, that are separated by first order phase transitions. The point of the first phase transition 
gradually moves to the right with an increase in a. This indicates that the overall behavior is dominated by 
that of the cleaner channel, which is intuitive. In this region before the first phase transition, the ML and 
MAP estimates coincide. The sequence correlation parameter c is pretty stable and noise independent before 
the point of first phase transition for MAP estimation. Afterwards the correlation shows discrete jumps and 
goes to the prior dominated value of 1 as observed from MAP estimation, whereas with the ML estimate, c 
monotonically reduces to 0. The advantage of MAP estimation over ML is the fact that in MAP the estimation 
is supported by prior and hence, performs better at intermediate noise ranges. For example, when a = 2, it 
can be seen that the correlation c degrades faster for ML estimation and is nearly constant for a large range 
of noise values for MAP estimation. The overlap v is found to gradually shift towards 1 before the first phase 
transition with increase in the value of a, implying that the estimated sequence for all the possible noises is 
primarily driven by the observations. Before first phase transition, we encounter the observation dominated 
regime except for a = I. The value of the overlap is not 1 because of the manner the overlap is defined in (121b . 
A more plausible way to see the observation dominated regime is to consider the overlap between MAP and 
ML estimated sequences, which is plotted in Figure |3(c)[ This confirms the fact that there is no observation 
dominated regime for a = 1 in 2-channel systems. After the first phase transition the overlap becomes worse 
and at higher noise, v decays towards 0. Increasing a enlarges the observation dominated regime, or in other 

^ The case with a < 1 is obtained by simply interchanging the channels. 
* The plots are made relative to the "noisier" observation channel. 
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0.1 0.2 0.3 0.4 0.5 

£ 

(a) a = 1 



0.1 0.2 0.3 0.4 0.5 

(b) a = 2 



0.1 0.2 0.3 0.4 0.5 

e 

(c) a = 4 



Fig. 4 (Color online) The entropy of the system obtained analytically with varying e (e denoting the higher noise level) for two 
observation channels at different noise ratios given by a at g = 0.24. 



0.5 

0.4 

0.3 
0.2 
0.1 



0.1 0.2 0.3 0.4 
(a) q = OA 



0.5 




0.1 0.2 0.3 0.4 
(b) q = 0.24 



0.1 0.2 0.3 0.4 
(c) q = 0.4 



0.5 



Fig. 5 (Color online) Region of zero (blue) and non-zero (white) entropy shown in the ei versus £2 (noise value in channel 1 
and 2 respectively) plot for different values of parameter q 



words, the region of ML estimation. It is also interesting to note that the correlation and overlap parameter 
are stable for a longer range with an increase in a, but show a rapid rise or decay respectively after the first 
phase transition. The analytical results are verified by simulations using Viterbi algorithm which are in good 
agreement with each other. 

Let us now focus on entropy. We see that the addition of a second observation channel with the same 
noise as the first results in non-zero entropy for all possible values of e. With the introduction of the second 
observation channel, the noise parameter 1x2 being of same strength as hi mutually cancel each other. This 
gives rise to multiple degenerate ground states for the acting external field and hence multiple solutions are 
obtained from MAP estimation, resulting in non-zero entropy 121 . However as we increase a, it can be seen 
from Figure |4] that the regions of zero entropy is obtained again and the point of first phase transition shifts 
to the right. However, after the first phase transition, the entropy rises, attains a maxima and then decays. The 
reason behind this is discussed in detail below where we discuss the regions of zero and non-zero entropies 
obtained for parameter q. 

Knowing the region of zero and non-zero entropy is of particular interest for the two observation channel 
system. We derive this region for the possible values of noise, e in the two observation channels for parameter 
q and this is plotted in Figure[5j We can qualitatively see that for the region corresponding to £1 = £2 line which 
defines a = 1, we can never obtain a unique sequence from MAP estimation. This is due to the prevalence of 
degenerate ground state solutions obtained at zero temperature due to the mutual nullification of the opposing 
field in the two observation channels. As we perturb the external random field applied in one of the observation 
channels by a certain amount (depending on the value of q), we migrate to the region of zero entropy. The 
region above the £1 = £2 line corresponds to a > 1, and at the zone boundary between the zero and non-zero 
entropy, we have hi + 2J = h2. Denoting by £1^ and £2;,, the values of the corresponding £1 and £2 at the zone 
boundarjO, we have 



1 - £lb £lb 



(42) 



^ The zone boundary for the region below £1 = £2 line can be obtained by interchanging £u, and £2b in <42t 
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Before h2 hits the point of first phase transition, the prior 27 and the noisier channel field strength h\ 
combine to cancel out the effect of clean channel h2 and the spins in certain clusters of the Ising chain will be 
frustrated which explains the reason why we have non-zero entropy at zero-temperature. Once /t2 crosses the 
zone boundary then the mutual cancelation is inefficient and we obtain the zero entropy regions. Thus, only 
for certain [hi, 112] we can attain the zero entropy condition in the thermodynamic limit. 




(a) Error with same noise (b) Error with Varying a 

Fig. 6 (Color online) Plot of error from (a) MAP estimates obtained analytically for 1— and 2— observation channel systems 
with same noise in each of them (analytical and simulated) and in (b) we have the plots using ML (blue) and MAP (red) estimates 
obtained by varying a in the two channels (simulated) for q = 0.24 

We will now find the error between the actual and estimated sequences. In Figure [6(a)] the error plots are 
made for (a) a single observation channel and (b) two observation channels with a = 1. The estimates are 
obtained semi-analytically using MAP technique (see ( 1391 ), ( 1531 )). The details of the derivation with further 
analysis is provided in Appendix l8.ll 

In Figure |6(b)| the ML and MAP error estimates are plotted using the Viterbi algorithm. Here we have 
studied the system at higher values of a (the semi-analytical treatment described in Appendix 18.11 can be 
extended to study the system having high a with tedious calculations). We notice from ML as well as MAP 
estimation (which in this case is close to ML estimate but the performance is heavily dependent on q) that 
addition of a cleaner channel (i.e., increasing a) results in a reduction of the error. For ML, the result is mainly 
driven by the cleaner charmel. The error due to ML estimation is simply the probability of error in the cleaner 
channel, and is given as 

P^,{error} = j^-l^ (43) 

For higher a, the error is small for low and intermediate noise values but at higher noise values, the error 
increases rapidly and becomes comparable to error at lower a. We also notice that MAP estimates have lower 
error relative to ML at smaller a for the particular value of q studied. However, as we increase a, for example 
a = 8, the ML and MAP estimated error becomes indistinguishable for any value of e. 

6.2 L-channels with identical noise 

In this subsection, we study a system of L observation channels with identical noise, for a fixed value of 
positive correlation coefficient J. The correlation c found from the MAP estimate matches exactly with the 
ML estimate for regions with small noise as can be seen from Figure |7(a)| However for intermediate values 
of noise we see that c from MAP estimation is more stable for all values of L whereas its value obtained from 
ML estimate shows a monotonic decrease with increase in the noise value 0. 

Interestingly, for higher L, c is stable even before the first phase transition at lower noise regimes. This 
can be seen by comparing the value of c with the reference value cq of the Markov process ^ . cq and c are 
given below. 



^ The ML estimate for L = 1 and L = 2 coincides as can be easily seen from <44t and ( l45t 
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EEC 



(a) c (b) V (c) VI 

Fig. 7 (Color online) Order parameters (a) c and (b) v obtained analytically by Ising Hamiltonian minimization (bold line) which 
shows an exact superposition with the data obtained from the Viterbi algorithm (open squares) for L channel systems with same 
noise for q = 0.24. The smooth decaying lines in the plot for c are obtained via ML estimate and the horizontal blue line depicts 
cq. (c)vi is the overlap between the MAP and ML estimated sequences. 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 



E £ 

(a) odd L (b) even L 

Fig. 8 (Color online) The entropy of (a) odd and (b) even channel systems obtained analytically with varying e (e is same in all 
the channels of the L channel system) a.iq = 0.24. 



Co = ^ x\X2Pst{xi)p{x2\xi) = \-2q 

CML = {\~2q){\~ IPMtierror} (44) 

PML{£rror}\i provides the error estimate for L even or odd and is given in ( 145 1) . With increase in noise 
the correlation shows jumps in its value. These jumps become smaller and appear at closer intervals with 
increasing noise intensity and c saturates to the prior dominated value of 1 . The overlap between the observed 
and estimated sequence is plotted in Figure [7(b)| Similar to what was done in the case of 2-channel systems, 
the overlap between the MAP and ML estimated sequences is plotted in Figure |7(c)| This gives a more clear 
indication of the observation dominated regimes for different L. With an increase in noise there is a gradual 
drop in overlap with discrete jumps at the points of phase transition, tending towards at high noise. However, 
on adding more channels to the system we find that the overlap shows a gradual monotonic decay. All the 
above analytical calculations are supported by simulations obtained by running the Viterbi algorithm and are 
plotted along with the analytical data for comparison. 

We now focus on the entropy of the system, defined as the natural logarithm of the number of MAP 
solutions that we can possibly obtain. The entropy is plotted in Figure [8] 

- When the number of channels in the system is odd, then for small values of noise (in the ML dominated 
regime) there is a unique solution to the MAP estimation problem. When varying the noise, the system 
undergoes first-order phase transitions at the points given by /t = ^. In particular, the entropy becomes 
non-zero at the first phase transition, h = 2J, signaling an exponentially many solutions to the MAP 
estimation problem. At each phase transition point we see that there are discrete jumps in entropy. The 
magnitude of those jumps at the points of phase transitions diminishes with the increase in L. However, 
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Cost plot at q=0.24 

0.5 
0.4 



^ 0.3 
o 



0.1 



"0 0.1 0.2 0.3 0.4 0.5 

e 

(a) Error with Varying L 

Fig. 9 (Color online) Plot of eiTor with ML (smooth) and MAP (wiggled) estimates obtained by varying number of channels for 
q = 0.24. 

the position of phase transitions is independent of the number of channels. Mapping the system to an Ising 
spin model, we can see that there are L + 1 effective forces acting on a spin at a particular instant, the L 
magnetic fields due to the observation channels and one due to the spin-spin interaction. At the point of 
first phase transition, the prior 2/ (quantifying the spin-spin interaction) nullifies the effect of one of the 
channels, whereas the magnetic field from the remaining even number of channels compensate each other 
due to mutually conflicting observations. 
- For even L, there are no regions of zero entropy and thus for all possible values of L, there are exponentially 
many MAP solutions corresponding to a typical observation sequence. This is due to the fact that for an 
even number of channels with the same noise, the magnetic fields acting on the spin mutually compensate 
each other, resulting in macroscopic frustration. Due to this, for any value of noise we have non-zero 
entropy Q. The rise is again found to be more gradual for large even L values. The points of phase 
transition when L is even is given by, h= Thus for these systems the number of phase transition points 
is reduced by half relative to odd L. 

Finally, in Figure |9] we provide the accuracy of estimation in the multi observation channel by MAP 
estimation using Viterbi algorithm. The same is also calculated using ML estimate (plotted for comparison) 
and is given by the following formulaQ 

L-l 

PMaerror}|i=odd = 1 - £ Q e'(l " e)^"' 

PML{error}|L=even = l- £ Qe*^(l-e)^-'-^(^^^e^(l-e)^ (45) 
The error in estimation is found to improve with the addition of more observation channels. 



6.3 One "clean" vs multiple noisy channels 

In this section, we bring in the notion of channel cost and use it to compare the performance of a single channel 
system with a multi-channel system while keeping the cost same. Channel cost can be interpreted as a function 
of the channel noise e. In many scenarios, it is more expensive for a system designer to build a channel 
with small noise than one with higher noise. For example, in realistic channels for data communication. 




^ The ML estimate for L = 1 and L = 2 coincides as can be easily seen from the estimation error provided in <45t for L odd 
and even 
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thermal noise at the receivers is a major source of channel noise and building a receiver with low thermal 
noise is expensive. The binary symmetric channels considered here can be used to model a power plant 
producing equipments and the noise can be interpreted as the probability of making a defective equipment. 
In order to make the equipments less defective, a system engineer would need expensive machines and good 
maintenance, which increases the cost. Because of this inverse relation between channel cost and channel 
error, for simplicity, we model the channel cost as being inversely proportional to the channel error and 
linearly proportional to the number of channels in the system. Thus, for a L-channel system, the channel cost 
is given by log{^) 0. 




Figures |10(a)pO(c)] give a comparison of cost vs. performance for various multi-channel systems with 
varying q. The cost is plotted as log (^) while the performance is measured in terms of error. A point on the 
curve with a specific error value, say E, indicates the channel cost incurred in order to build a system that 
can tolerate a maximum error of E. It is easy to note that if the channel cost was inversely proportional to 
e alone, then increasing L would have always resulted in a better system to tolerate a certain value of error. 
However, because of the linear dependence on the number of channels L, we see a certain value of E, say £th 
after which the the cost required to build an L channel system becomes higher than that required for a single 
channel system in order to attain the same level of performance in terms of error requirements. 

In line with our argument, we observe from the plots that in order to build a system with small error, it 
is advisable to increase L in order to minimize the channel cost. However, if we can tolerate a larger error, it 
would be cheaper to go for a single channel system. For example, when the error is small, the cost required to 
build a 8 channel system is less than that required to build a single or a 2-channel system. This is true for all 
values of q. However, when the error is large, the single channel system is much more preferable, since the 
cost required is much lesser than that needed to build a system with L > 1 . 



6.4 Single observation channel with gaussian noise 

A detailed study of the single channel with a Gaussian observation noise model is relegated to Appendix [821 
Here, we only provide the results for the order parameters c and v which are plotted in Figure [TT] We also 
show vi , which is the overlap between the ML and MAP estimated sequences. The plots are continuous and no 
first order phase transitions are observed compared with the discrete noise model considered earlier. We also 
find that the entropy is zero for this case (see Appendix l8.2l for details), meaning we do not have exponentially 
many solutions for a given observation sequence. An important point to note is that in the discrete case, a zero 
entropy signified an observation dominated regime, where the ML and MAP estimates coincided. However, 
in the gaussian case, we have zero entropy at all values of a, but this does not mean that the ML and MAP 
estimates coincide everywhere. This is justified by seeing vi, which monotonically goes to zero, indicating 
that the correlation between the ML and MAP estimates reduces with an increase in the variance a^. 



Here we take the log of | to define the cost because for small error | is exponentially large 
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2 4 6 8 10 

a 

Fig. 11 (Color online) Order parameters c (in blue) and v (in red) plotted from the analytically obtained data from Ising hamil- 
tonian minimization (bold line) and from the Viterbi algorithm (open circles) for the case of single observation channel with 
Gaussian noise for q = 0.24. v\ is the overlap between the MAP and ML estimated sequences. 



7 Conclusion 

In this paper we have presented an analytical study of Maximum a Posteriori (MAP) estimation for Multi- 
channel hidden Markov processes. We have considered a broad class of systems with odd and even number 
of channels (having the same noise intensities) to understand the MAP characteristics in analogy with the 
thermodynamic quantities. In all the system models studied here, we observe a sequence of first-order phase 
transitions in the performance characteristics of MAP estimation as one varies the noise intensity. Remarkably, 
the position of the first phase transition depends only on whether L is odd or even, but not on its value. 

In the systems with odd number of channels, there is a low noise region where the MAP estimation prob- 
lem has a a uniquely defined solution, as characterized by vanishing zero-temperature entropy of the corre- 
sponding statistical physics system. At a finite value of noise, the system experiences a first order phase transi- 
tion and the number of solutions with posterior probability close to the optimal one increases exponentially. In 
contrast, in systems with even number of channels, the MAP estimation always yields an exponentially large 
number of solutions, at all noise intensities. This is explained by drawing an analogy with the thermodynamic 
system where the spins experience macroscopic frustration due to contradicting observations from different 
channels. Our results indicate that for a system with L = 2 observation channels, one can recover the region 
of zero entropy by introducing noise-asymmetry in the channels. In addition to the binary symmetric channel 
we have also considered the Gaussian observation channel, and demonstrated that the corresponding system 
has a vanishing zero-temperature entropy, indicating a unique MAP solution. 

Finally, we analyze the tradeoff between system cost and estimation error for L— channel systems, by 
assuming an inversely proportional relationship between channel cost and channel noise. Our results suggest 
that if the objective is to achieve low estimation error, then it is more advantageous to build a system with 
larger number of noisier channels, rather than having a single channel with better noise-tolerance. However 
if we can tolerate higher error, it is more beneficial to use a single channel system. An exception is noticed 
for the 2-channel system whose performance relative to a single channel system is dependent on the spin- 
spin interaction (J). For moderate J the estimation error for both the single and two channel systems are 
comparable while at lower J the single channel system is found to perform better to achieve any degree of 
accuracy. 

There are several directions for extending the work presented here. For instance, it will be interesting 
to generalize the analysis presented here beyond the binary HMMs, e.g., by reducing the problem to a gen- 
eralized Potts model. Note that the critical behavior observed here is due to two competing tendencies, (a) 
accommodating observations and (b) hidden (Markovian) dynamical model. Thus, it is natural to assume that 
similar behavior can be expected in non-binary systems as well. Another interesting problem is to "break" the 
macroscopic degeneracy of the MAP solution space by adding additional constraints and/or objectives. For 
instance, among all the MAP solutions, one might wish to select the one that has the highest overlap with the 
typical realization of the hidden process, which might be useful in the context of parameter learning 1161 . 
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8 Appendix 

8.1 Analytical error calculation using MAP 

For finding the error expression for a single observation channel^ using MAP analytically we use a modified 
Hamiltonian ( 1231) . To calculate the error estimate we need to find — ^g/gl^^o./S-s^oo where fg represents the free 
energy of the modified Hamiltonian and is given by, 

1 ^ , , [pi 
= -^Lviy^^)t^iik) (46) 

with = hyii +gSk +A{^k-i)- Note that the limits are taken in order, i.e., first we take the limit ^ — and 
then j3 — oo. Using the recursion relation from ( 1271 ) and ( 1281 ) we get d^yfg as, 

= -^Ip(y,s)i f {tanh(i3(^, +7)) +tanh(/3(^, -/))} {s, + dgA{^,.,)} (47) 

" y,s ^ k=l 

with the term 5gA((^^_i) given by, 

dgA{^k-i) = f {tanh(j3(^,.i +/)) +tanh(j3(^,_i - J))} {s^-i + dgA{^k-2)} (48) 

Taking the limits and simplifying, we can write the terms in the above expression as, 

tanh(j3(^, +7)) +tanh(/3(^, -/)) = -25 {^^ < -J)+25{^k > J) 
tanh(j3(^,_i +7)) -tanh(/3(^n -/)) = 25{-J < < J) (49) 

Using this we can write, 

- <3g/glg^o,;3^. = ^ Ip(y,s) I [-5(1* > J) + 5(1* < -msk + 5{-J< < 7)[.*_i + ...]...] 
' y,s k=l 

= i Ip(y^«) I [fmh+g{^k-i)h-i +g{^k-2)W-2 ...]...] 

y,s k=l 

= ilP(y^^) I ^*[/(^A.) +g(^A-)/(^*+l) + • • -+8^ . ■■8i^N-l)mN)] 

" y,s k=l 

= ^ I (^IP(y>s)^a/(&) +g(^*)/(^A.+ i) + . . ■+8{^k) . ■■8{^N-i)MN)?j (50) 

where f{^k) = ^i^k > J) ^ ^i^k < -J) and g{^k) = 5{-J < E,^ < J). We outline the process of evaluating a 
particular term of the inner series, say, the term 

£p(y,s)5^g(^i) . ..g{^k+n-l)f{^k+n) 



' The approach can be extended to multi channel system with tedious algebra 
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y-s 

L P{y,s)Skg{^k)---8{^k+n-\)f(.^k+n) 
yi,...,y^;si.,..,si, 

yk+ 1 1 ■ ■ ■ ,.y*+n 1 ■ • >*/t+n 

= 12 P{^k,Sk,yk+U--- ,yk+,uSk+l , ■ ■ ■ , ■ • ■ gi^k+n-l)f{^k+n) 

^k+1 f-,^k+n 

= L P{^k+l,---,Sk+n,yk+\,---,yk+n\^k,Sk)p{^k,Sk)Skg{^k)---g{^k+n-l)f{^k+n) (51) 

ik,Sk\^k+l<---,^k+n 
SA-+lvA+n 

where in (151b we replace yk's in the sum with ^j^'s since each sequence of va's corresponding to a sequence of 
^k's (for two channel system there will be multiple y^'s corresponding to a ^k and this is treated accordingly) 
and 

p{^k,sk) =Y,oii^k,Sk,yk) (52) 

yk 

It is easy to see that the term 

^k = Ip(y,s)..[/(^.) +g{^k)m+i) + . ■ .+g{^k) . ■ ■g{^N-i)f{M 
y>s 

which is independent of k for N ^ °°. For our calculation we approximate the error by keeping only the first 
four terms in the expression for Ak, 

l^p{y,s)sk[m)+g{^k)m+i)+giM^k+i)g{^k+2)m+3)+g{^^^^^ 

y,s 

In the thermodynamic limit, we drop the subscript k which gives 

-'^gfg\g^o,l}^^ = ^ 
Hence, A is the overlap between Sk and yk, and estimation error is calculated as. 



PMAp{error} = (53) 

Here we present results showing the correspondence of the simulated and the analytical data obtained in 
the ranges of first three phase transitions, h > 2/,2/ > h> J and / > /i > ^. In Figure |6] we have already 
used this semi-analytical expression to estimate the MAP error for 1 and 2 channel systems at a particular 
value of q. In Figure [12] we plot the error calculated for a single observation channel at different values of q. 
On truncation of the infinite series ( 1391 ) at the fourth term we see an exact match between the analytical and 
the simulated data for larger values of q. However we can see that that around the third phase for ^ = 0.1, 
the match is not exact. This implies that as we go to smaller values of q, for finding an exact match with the 
simulated data, we need to evaluate higher order terms of the infinite series. 
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(a) ^ = 0.1 (b) 5 = 0.24 (c)q = OA 

Fig. 12 (Color online) Estimation error plotted from MAP estimate using simulation (red) and analytical formula (blue) for a 
single observation channel 



8.2 Gaussian observation model 



Now we consider a single Gaussian observation channel. The Ising Hamiltonian for the case of Gaussian 
observation channel is given by, 

Af-l N 

HG{y,x) = - JxkXk+i - ^^kVk (54) 

k=l k=l 

where h = A, being the variance of Gaussian noise. The noise distribution in the observation channel is 
given by, 

1 ta--'i)' 

^(yk\xk) = /— e 2a^ (55) 
V2na 

Using the above equation with p(y|x) = n^=i ^iyk\xk) we get after discarding the irrelevant additive factors, 

A' J 

logp(y|x) = Y —XkJk (56) 
k=\ ^ 

The recursion relation is given as 

^k=hyk^A{^k-i) (57) 

where the generic form of the function ) is defined in ( |27] |. It is easy to see that —J<A{^) <J. Thus, ^ 
(which we refer to as our state) takes the form nJ + hy where n = {—1,0, 1}. Since the states having n = 
are non-recurrent (see the argument in Section|4]l, we can quantify our state space in the form ±J + hy, where 
y € (— °°, °°). Thus, our state space is now continuous, as opposed to being discrete in the case when the noise 
channel was binary. 

The conditional probabilities for the Gaussian distribution in the observation channel can be written as, 
(0i^,y,z\^\y,z) = p{z\z)7z{y\z)(pi±J + hy\±J + hy' ,y) (58) 
The exact expression for ( 1581) is tabulated below which is used to calculate the state transition probabilities. 



Table 5 Conditional probabilities for the Gaussian distribution of the observation channels where a(^,y,z\^' ,y' ,z') = 0){aJ + 
hy,y,z\cJ + hy' ,y' ,z') and the combinations of {a,z),{c,z') are tabulated. 



{a,z)l,{c,z')- 


> (1,1) 


(1,-1) 


(-1,1) 


(-1,-1) 


(1,1) 
(1,-1) 


£\i = l 




^%'=^ 




(-1,1) 




_2_r4| 

1-9^ = l 






(-1,-1) 
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The quantities used in the table are given as, 



1 



1 



1 



1 



{\-q)e "^^/(y>0)+ (l-g)e 2<'- / — — < < 



-27 



27 



(l-^)e ^l^/(y<0) + 



27ra 



(y-/-;')^ / , 27 
(l-^)e 0<y'<=^ 
Ina \ ■ h 



h2 



-27 



(59) 



0.02r 



0.015 



0.01 



0.005 



1r 

0.8 
0.6 
0.4 
0.2 



-2000 -1000 







1000 2000 -2000 -1000 







1000 2000 



(a) PDFforpj, at C7 = 0.1 



(b) CDFforp.„ at (J = 0.1 



Fig. 13 (Color online) Plots for ps, for q = 0.24 




Fig. 14 (Color online) Plots for p,, for q = 0.24 



We provide an explanation to calculate the entries corresponding to the 1st row and the 1st and 3rd columns 
of Table|5] 

- = (1, 1, 1, 1): Qualitatively, this means that we were in a state ^' = J + hy' and z' = 1 and we 

moved to ^ = J + hy and z = 1 . From the conditional probability expression (158b . we need to compute 

0}{^=J + hy,y,l\^'=J + hy\y',l)^p{l\l)7i{y\l)(p{J + hy\J + hy',y) 

= {l-q)n{y\l)(p{J + hy\J + hy',y) 
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Now, we can go from J + hy' to 7 + hy in two ways: 
- When y > (we always assume h > 0), A{J + hy') = J and ^ = A{S,') +hy = J -\-hy, so we should 
have y as the channel observation. Thus, we have 

7t{y\l)(p{J + hy\J + hy',y) = -^e-^ (60) 



- When ^ <y <0,A{J + hy')=J + hy' andE, = +'^3' = + V This means if we want to 
have ^ =J + hy, our observation should hsy~y' since this will give ^ = A{^') + hy = J + hy' + h{y — 
y') = J + hy. Thus, we have 

K{y\l)(p{J + hy\J + hV,y) = -^^e"'"' 2-^^" (61) 

V27ra 



Combining ( |60] | and (|6T]i, we get 

(0{^=J + hy,y,l\^' = J + hy',y',l)=p{l\l)7i{y\l)(p{J + hy\J + hy',y) (62) 

1 (v-l)^ , 

{l-q)e ~^I{y'>0) 



2na 

I iy-y'-i)^ / -2J 



(63) 

- (fl,z,c,z') = (1, 1, — 1, 1): Qualitatively, this means that we were in a state ^' = ~J + hy' and z' = 1 and 
we moved to = / + hy and z = 1 . Again, using the conditional probability expression (158b . we have 

(0{^=J + hy,y,\\^' = -J + hy\y\\)=p{\\\)n{y\\)(p{J + hy\-J + hy\y) 

= {\-q)n{y\\)(p{J + hy\-J+hy',y) 

Now, we can go from — / + hy' to 7 + hy in the following manner. When y > ^■, A{-J + hy') = J and 
^ = A{£,') + hy = J + hy, so we should have y as the channel observation. Thus, we have 

n{y\l)(p{J + hy\J + hy',y) = -j^e~^ (64) 

V27ra 

We thus have from ( l64l) 

ft)(^ =/ + /2y,);,l|^' = -7 + /V,y,l) =p(l|l)7r();|l)(p(7 + /ty|-/ + /jy,y (65) 

1 0-1)- / 27\ 
= (l-g) ^ e ~^l(y' > —) (66) 



27ra V h 

The other entries in Table [5]can be computed in a similar manner. The expressions for the stationary states 
can be obtained from that of the conditional probabilities as, 

J^JJ^^(o{^,y,z\^',y',z')pA^',y',z)d^'dy'dz'=pA^,y,z) (67) 

Now let us define for convenience, 

co{^ = aJ + hy,y,z\^' = cJ + hy' ,y' ,z') = Y{a,y,z\c,y' ,z') (68) 
Psi{^ =aJ + hy,y,z) = S{a,y,z) (69) 

With the above notations we can conveniently write the coupled integral expressions for finding the stationary 
probabilities as, 

(70) 



S{a,y,z)= Z L f r{a,y,z\c,y',z')E{c,y',z')dy' 
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It can be seen that there are four combinations of a and z, i.e., {{a,z) = (1,1),(1, — 1),( — 1,1), (—1,-1)}. 
Hence, we get four coupled integral equations as follows 

S(l,.y,l)= / ^^{l--q)e 1^ E{l,y' ,l)dy' + ^^^^{l-q)e E{l,y' ,l)dy' 

./v'=o \/2na ■Jy'=^ vino 



E{\^y'-l)dy' + j ^^^-^qe-^^^E{l,y'-\)dy' 



Jy'=Q s/lna ■>y=^^ 
+ r -^{\-q)e~'^Z{-\,y'A)dy' 



S(l,);,-1)=/ -^qe ^E{l,y',l)dy'+ „ ^5^^^ ^S(l,y,l)^/y 

/■°° 1 _(z±i)i , , 1 (y-y'+if 

+ -j^(l-q)e-^3{l,y'-l)dy'+ ^^{l ^ q)e-^^ S{l,y\ -l)dy' 
Jy'=o \/2%a •'y'=nr v^na 

r 1 (-'■+'.)^ 
+ / ^S(-l,y,lKv' 
■'y'=T w2na 

+ / ^^{\-q)e-^E{-\,y'-\)dy' 



s(-i,);,i)=/ ^^(i-^)e-^s(-i,y,i)t/y+ / ^^(i-^)e ^^s(-i,y,ivy 

.yy=-oo V27ra ./v'=o v27ra 

+ / ^^qe ^Ei~l,y',-l)dy'+ ^^qe ^ E{-\,y' -l)dy' 

+ /" ^=^(i-^)e"^s(i,y,ivy 

jy'=-^ y/2na 
+ / ^^qe-^E{iy-l)dy' 



s(-i,);,-i) = / ^^^e s(-i,y,ivy+ / s(-i,y,ivy 

.yy'=-oc v27ra iy=o V2na 

fO I (y+if , /"¥ 1 (y-y'+if 

+ ^^{l-q)e-^E{-l,y',-l)dy'+ - ^)e-^^S(-l,y, - 

Jyi=-^ \/2no Jy'=o \/27ta 

+ / ^^^e ^s(i,y,i)^/y 

/"T 1 (.''+1)^ 

+ / ^=^(i-9)e-^s(i,y,-i)^/y 



These equations are solved numerically to find Pst(,^ ^y-,z) using ( |69] |. The probability density function (PDF) 
and the cumulative density function (CDF) are shown for two extreme values of sigma in Figures [13] and fT4l 
From this, it is easy to see that the distribution is continuous and the entropy which is given by (138b is 
found to be zero. 
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Abstract The performance of Maximum a posteriori (MAP) estimation is studied analytically for binary sym- 
metric multi-channel Hidden Markov processes. We reduce the estimation problem to a ID Ising spin model 
Qj ■ and define order parameters that correspond to different characteristics of the MAP-estimated sequence. The 

solution to the MAP estimation problem has different operational regimes separated by first order phase tran- 
sitions. The transition points for L-channel system with identical noise levels, are uniquely determined by L 
being odd or even, irrespective of the actual number of channels. We demonstrate that for lower noise inten- 
sities, the number of solutions is uniquely determined for odd L, whereas for even L there are exponentially 
■ many solutions. We also develop a semi analytical approach to calculate the estimation error without resorting 

. to brute force simulations. Finally, we examine the tradeoff between a system with single low-noise channel 

and one with multiple noisy channels. 



Keywords Multi-channel Hidden Markov Model • Maximum a Posteriori Estimation • Ising Model 
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\^ . 1 Introduction 

(N- In many dynamical systems direct observation of the internal system state is not possible. Instead, one has 

noisy observations of those states. When the internal dynamics is governed by a Markovian process, the 

■ resulting systems are known as Hidden Markov Models, or HMMs. HMM is the simplest tool to model 
the systems where a correlated data passes through a noisy channel |IT]|2l. It is used extensively to model 

C^l ■ such physical systems and finds applications in various areas such as signal processing, speech recognition, 

bioinformatics and so on |I3]I3- 

One of the major problems underlying HMMs is the estimation of the hidden state sequence given the 
observations, which is usually done through maximum a posteriori (MAP) estimation technique. A natural 
^ . characteristic of MAP estimation is the accuracy of estimation, i.e., the closeness of the original and estimated 

■ sequence. Another important characteristic is the number of solutions it produces in response to a given 
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observed sequence. This is related to the concept of trackability, which describes whether the number of 
hidden state sequences grows polynomially (trackable) or exponentially (non-trackable) with the length of 
the observed sequence. For so called weak models^ it was established that there is a sharp transition between 
trackable and non-trackable regimes as one varies the noise \5\. A generalization to the probabilistic case was 
done for a binary symmetric HMM fS], where it was shown that the non-trackability is related to finite zero- 
temperature entropy of an appropriately defined Ising model. In particular, [6] demonstrated that there is a 
critical noise intensity above which the number of MAP solutions is exponential in the length of the observed 
sequence. 

In this work we study the performance of MAP estimation for the scenario where the hidden dynamics is 
observed via multiple observation channels, which we refer to as multi-channel HMMs. Multi channel HMM 
is a more robust way of modeling multi channel signals, such as in sensor networks where local readings 
from different sensors are used to make inference about the underlying process. Note that the channels can 
generally have different characteristics, i.e., measurement noise. Here we assume that the readings of the 
channels are conditionally independent given the underlying hidden state. 

One of our main results is that the presence of additional observation channels does not always improve 
the inference in the sense of above characteristics. In particular, in systems with even number of channels and 
identical noise intensities, there is always an exponential number of solutions for any noise intensity, indicated 
by a finite zero-temperature entropy of the corresponding statistical physics systems. Intuitively, this happens 
because of conflicting observations from different channels, which produces a macroscopic frustration of 
spins in the system. Furthermore, for two-channel systems with generally different noise characteristics, we 
calculate the phase boundary between the regions of zero and non-zero entropy regimes in the parameter 
space. In all the systems studied with single or multiple channels, we observe discontinuous phase transitions 
in the thermodynamic quantities with the variation of noise. The points of phase transitions are the same for 
all the even number channel systems. This is also true for any of the odd number channel systems but the 
transition points are different from that of the even number channel systems. 

For general L-channel systems with identical noise in the channels, we calculate the different statistical 
characteristics of MAP for this scenario, and find that the average error between the true and estimated se- 
quences reduces with the addition of channels. Furthermore, we bring in the notion of channel cost which lets 
us explore the tradeoff between the total cost and error that one can tolerate in the system. Finally, we also an- 
alyze the performance of MAP estimation for Gaussian noise, and find that the entropy is always zero, so that 
one has exactly one solution for every observed sequence. This suggests that the existence of exponentially 
degenerate solution relates to the discreteness. 

Let us provide a brief outline of the structure of our paper: We start by providing some general information 
about Maximum a Posteriori estimation in Section |2] We introduce binary symmetric HMMs and describe 
the mapping to an Ising model in Section [3] In Section 13.31 we describe the statistical physics approach to 
MAP estimation for multi-channel binary HMMs and define appropriate order parameters for characterizing 
MAP performance. Section |4] describes the solution of the model. The recurrent states are given in Section |5] 
followed by presentation of results in Section|6] Section|6T|focuses on a detailed analysis for two observation 
channel scenario and Section l6^ provides results for the multiple channel scenario. In Section l63] we discuss 
about the cost of designing a multiple channel system and its impact on the system performance relative to a 
single channel system. Finally, we conclude by discussing our results and future work. In the appendix, we 
give details about analytical calculation of MAP accuracy and elaborate on the Gaussian observation model 
along with the main differences from the binary symmetric case. 



2 MAP Estimation 

The present work focuses on a specific class of stochastic processes, namely, the binary symmetric HMMs 
although the techniques of MAP estimation can be applied to general stochastic processes too. In this section, 
we give a brief idea on MAP estimation for generalized HMMs and defer to the study of binary symmetric 
HMMs in Section [3 

Let us consider x = (xi , . . . ,xyv) as the signal generating sequence. The observation in L different channels 
at every time instant can be generically written as y = (y \ . . . ,y^) with y' = (y'j, . . • jyjv)!!'^!, -- ^}- Here x and 



Weak Models can be described as a simplification of HMMs where one specifies possible transitions and observations from 
a given state without assigning probabilities. 
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Fig. 1 (Color online) Hidden Markov model with multiple observation sequences 

y are the realizations of discrete time random processes 5C and '3/' ; with / being the index for the observa- 
tion channel as shown in Figure [l] y''s are the noisy observations of obtained from different observation 
channels. The observations from different channels are mutually independent and are described by the condi- 
tional probability p(y' |x). The observed sequences y''s are considered to be given along with the probabilities 
p(y'|x) and p(x). Thus, we are required to find the generating sequence x from the given sets of observation 
sequences. 

MAP provides a method to estimate the generating sequence x on the basis of the observations y'. x is 
found by maximizing over x the posterior probability, 

p(x|y\...,y^)=p(y\...,y^|x)p(x)/{ £ p(y\...,y^)} 
Since p(y \ . . • ,y^) does not depend on x, we can equivalently minimize the Hamiltonian which is given 

by, 

/7(y,x) = -log[p(y',...,y^|x)p(x)] (1) 

where by log we imply natural logarithm. Because of the mutual independence between the observations 
conditioned on x, we can rewrite //(y, x) as 

/f(y,x) = ~log[p(x)] - £log[p(y'|x)] (2) 

(=1 

The advantage of using H{y,x) is that, if ^ is ergodicQ (which we will assume in the rest of the paper), then 
for >> 1, //(y,x(y)) will be independent from y, Vy € i2yv(^), ^^Ni?^) being the typical set of 
The typical set is the set of sequences whose sample entropy is close to the true entropy, where entropy is a 
measure of uncertainty of a random variable and is a function of the probability distribution of the sequences 
in '3^. The typical set has total probability close to one, a consequence of the asymptotic equipartition property 
(AEP) Q. Thus for A/^ >> l,the sequence (y\ ... ,y^) will lie in the typical set of with high probability. As 



- Ergodicity implies time average is equal to ensemble average, i.e., an ergodic process has the same behavior averaged over 
time as averaged over the space of all its states. 
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a result, we have Y, P(y) ^ 1 ^nd all elements of £2n (S?^) have equal probability. We can take advantage 

of this and use for //(y,x(y)) the averaged quantity Y.P{y)H{y,x{y)). 

_ _ ^ _ _ _ 

Let us consider some extreme cases of noise in the observation channels. Since we are dealing with more 
than one observation channel we have to consider the overall contribution from different channels with vary- 
ing noise. When we refer to the weak and strong noise in the channels we will be referring with respect to the 
noisiest channel of the whole set. The other channels will be relatively cleaner and modify the overall perfor- 
mance. For the case (/) when the noise applied to the channels is very weak, p(x|y) = nf=i 11^=1 ^i^k ^y'k)- 
This results in recovering the generating sequence almost exactly and (//) for the case of strong noise in 
channels the estimation is prior dominated i.e. p(x|y) = p(x), and hence is not informative. Without any 
prior imposed, p(x) oc const.. Here, the MAP estimation reduces to the maximum likelihood (ML) estimation 
scheme. It should be noted that using ML estimate we can also obtain the exact sequence in the low noise 
regime. 

Minimization of Hamiltonian from ([T]) for a given y can be readily done using the Viterbi algorithm, but it 
produces one single optimal estimate x(y). For completeness we should also seek for other possible sequences 
x^^'l (y) for which //(y,xt''l (y)) ( greater than //(y,x(y))) is almost equal to the optimal estimated Hamiltonian 

H(yxi^Uy)) Hiyx(y)) 

in large N limit. Under that approximation we have lim/v-^Do — -jj = limAf_j.oo — 'j^ — . These obtained 

sequences from the MAP estimate are equivalent for A' — > oo and can be listed as, 

xW(y),7=l,...,^(y) (3) 

with -yK(y) denoting the number of such possible sequences. If log^(y) oc N, the ergodicity argument can 
be used to obtain the logarithm of the number of solutions for the observed typical sequence, 

= lp(y)iog^(y) (4) 

y 

In the limit finite value of 9 = ^ implies that there are exponentially many outcomes of minimizing 

H{y,x) over x. The term is called entropy from analogy with the Ising spin model as will be explained in 
detail in the subsequent sections. 

Various moments of xl^I (y) are calculated, which are random variables due to the dependence on y. The 
knowledge of the moments along with the error analysis is employed to characterize the accuracy of the 
estimation. For weak noise these moments are close to the original process We also evaluate the average 
overlap between estimated sequences xt''! (y) and the observed sequences y. If the overlap is close to one, it 
would imply a observation dominated estimation of the sequence. 



3 Binary symmetric Hidden Markov Processes 

3.1 Definition 

We analyze a binary, discrete-time Markov stochastic process ^ = {xi,X2, ■ ■ ■ ,xn). Each random variable xjc 
can have only two realizations x^ = ± 1 . The Markov feature implies, 

N 

P(x) = Y[pixk\xk-i)p{xi) (5) 

k=2 

where p(xji\xji-i) is the time-independent transition probability of the Markov process. The state diagram 
for the binary, discrete time Markov process is shown in Figure |2] We parameterize the binary symmetric 
situation by a single number < q < I, with /?(1|1) = p{—i\ ~ 1) = 1 and p{l\ — 1) = /?(— 1|1) = q 
and the stationary state distribution is considered to be uniform /?st(l) = PA{ — i) = j- The noise process is 
assumed to be memory-less, time independent and unbiased. Thus, 

p(yix)=nn^'wk)> ^[=±1 (6) 

/=1A:=1 
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where n' {y'^. ) is the probability of observing y^. in channel ; given the state a> . For the channel tt' ( — 1 1 1 ) = 
7r'(l| — 1) = £,, 7r'(l|l) = 7r'( — 1| — 1) = 1 — e,; e, being the probability of error in the channel. Here 
memory-less refers to the factorization in time independence refers to the fact that 7r'(. . . | . . .) does not 
depend on k, and finally unbiased means that the noise acts symmetrically on both realizations of the Markov 
process, i.e., 7r'(l| — l) = 7r'(— 1|1). Starting from this, we vary the noise between multiple observation chan- 
nels and study its effect in the sequence decoding process. In Appendix 18.21 we discuss a more general case 
involving Gaussian distribution of noise realizations in the observation channels. The detailed formalism is 
done for the case of a single observation channel. 

The composite process J^'S^ with realizations {y'f,,xic) is Markov even though 3^ in general is not a Markov 
process. The transition probabilities for the «'* observation channel can be written as, 

Piyk+vXk+i\yk,Xk) = 7t'{y'i,^i\xk+i)p{xk+i\xk) (7) 



1 - Q a Si 




S2 O 1 - g 



Fig. 2 (Color online) State Diagram for a Binary symmetric Markov chain 



3.2 Mapping to the Ising model 

The problem can be efficiently mapped to the Ising spin model where we will represent the transition proba 
bilities as. 



gJVk-l 1 

The observation probabilities for binary symmetric noise are given as. 



. . e'''>'W 1 



l-e, 



and that for the Gaussian distribution of noise in the observation channels is given by. 



T^\y'kW) = 



InOi 



(8) 



(9) 



(10) 



where of is the noise variance. 

Combining (|9]l with ^ and with the help of (|5]) and ^ we represent the log-likelihood for the case with 
binary symmetric noise realization as. 



N L N 

HB{y,x) = -jY, ^kXk~i - L y'k^''- 

k=2 i=l k=l 

The same for the Gaussian distribution of noise (see ( flOl )) is given as. 



which simply reduces to. 



N L 2 W 

^G(y,x) = -JY, ^kXk-l - £ ir-j £ {yi - Xk) 
k=2 i=l k=l 



N L ^ N 

k=2 i=\ k=\ 



(11) 



(12) 



(13) 
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The redundant additive factors are omitted from final expressions of the Hamiltonian. The subscripts 'B' 
and 'G' denote the binary and Gaussian cases respectively. H{y,x) (representing the general Hamiltonian 
for both 'B' and 'G') is the Hamiltonian of a (Id) Ising model with external random fields governed by the 
probability p(y|x) [jSj. The factor J is the spin-spin interaction constant, uniquely determined by the transition 

probability q. For q < j, the constant J is positive. This refers to the ferromagnetic situation where the spins 
tend to align in the same direction. The random-fields (within the Ising model) obtained in the expression for 
H{y,x) display a non-Markovian correlation. This is different from what is known for the random fields from 
the literature, that are in general uncorrelated 191I. I110L For all calculations, we assume a, > 0. Let us 
also introduce another parameter a = h^/hi which represents the ratio of the noise between two observation 
channels (subscripts 1 and 2 refer to the two observation channels) and is assumed to be a positive integer. 



3.3 Statistical Physics of MAP estimation 

We now implement the MAP estimation to minimize the Hamiltonian H{y,x). In Section |2] we argued that 
this is equivalent to minimizing Y,P{y)H {y ,x.{y)) . For this purpose let us introduce a non-zero temperature 

I ~ ~ ~ 
T = ^ >0 and write the conditional probability as, 

-j3//(y,x) 

p(x|y)^^^; Ziy)^l^e-P"<~l^^ (14) 

where Z(y) is the partition function. Using ideas from statistical physics, we find that p(x|y) gives the proba- 
bility distribution of states x for a system with Hamiltonian H{y,x). The system is assumed to be interacting 
with a thermal bath at temperature T, and with frozen random fields /i,v^Jl 1). For T — !> 0, and a given y, the 
individual terms of the partition function are strongly picked at those x(y) which minimize the Hamiltonian 
H{y,x) to get to the ground states. If, however, the limit T — !> is applied after the limit A' — !> oo we get, 

where x'^1 (y) and c/K(y) are given by Oil. We are going to work on this low temperature regime from now on. 
The average of H{y,x) in the T — > regime will equal its value minimized over x, 

£p(y)p(x|y)//(y,x) =£p(y)/f(y,xW(y)) =/f(y,xW(y)) (16) 
xy y 

where by assumption all ground state configurations x(y) have the same energy H{{y,x\'^^ (y)) = H{y,x\^^ (y)) 
for any 7. 

The zero-temperature entropy depicting the number of MAP solutions and is given as, 

= -Lp(y)p(x|y)logp(x|y) = £p(y)log^(y) (17) 
xy y 

Another important statistical parameter, free energy can also be defined as, 

F{JAT) = -r£p(y)log£e-^^(l''') (18) 
y X 

where the Ising Hamiltonian H{y,x) is given by (fTTT l or ( [13] ). With the help of this definition we can now 
define the entropy in terms of the free energy as, 

= -dTF\T^O (19) 

We also define the order parameters which are the relevant characteristics of MAP below, 

1 1 

c = Lp(y)P(x|y)T7 L Xkx,+, = -djF (20) 
xy k=\ '-^ 
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, L N 1 

V = Ip(y)p(x|y)- £ £ = t;^/>^ (21) 

c accounts for the correlation between the neighboring spins in the estimated sequence and v measures the 
overlap between the estimated and the observed sequences for the various cases that are studied. 

The order parameters defined above are indirect measures of estimation accuracy. Calculation of direct 
error involves evaluating the overlap between the actual hidden sequence that generates a given observation 
sequence, and the inferred sequence based on the observation sequence. Since we are interested in average 
quantities, it is clear that calculating the average error amounts to finding the overlap between a typical hidden 
sequence and its MAP estimate, i.e., 

N 

4 = X 'kXk (22) 

k=l 

where s = is a typical sequence generated by the (hidden) Markov chain. Let us define a modified 

Hamiltonian, 

N L N N 

Hg (y, x; s) = -/ ^ XkXi,_ 1 - L y'k^k -gj^ ^kXk (23) 

k=2 i=l k=l k=l 

Further, let us introduce fg which is related to the free energy Fg of the modified Hamiltonian hy fg = -^.IX. 
is given by, 

= -Tll^V{l,^)\ogl^e-^"^(l^'^^ (24) 
With this the overlap can be simply given by 4 = —dgfg\g^Qp^^, where the limits are taken in the order 



4 Solution of the Model 

Here we will solve for the order parameters and entropy of multiple observation channels using tools that we 
developed using the Ising spin model. The solution of the model for L = \ was provided in 0. To make the 
paper self-contained, here we repeat the main steps of the derivation. Let us recall the partition function (114b 
which for L observation sequences can be written as, 

iV-l L N 

Ziy) = L e *=i .-1 (25) 

A'l— ±l,...,A'A' — ±1 

Summing over the first spin yields the following transformation for Z(y), 

„ j3/l'.v,+i.v,+/3 E A; E >{,.vt PjY xk+,x,+p t h, z y;xk+p^2X2+mii) 

e k=i i=i k=i = 2I1 e '=' (26) 

Jl,...,xjv X2,...,XN 
L . L , 

where ^2 = + I ^1 = I hiy\ and, 

!=1 i=l 

1 cosh[/37 + /3M] 

Mu) = x^log TT^ — ^ (27) 

2p cosh[p7-pMj 

B{u) = -^log(4cosh[j37 + /3«]cosh[/37-j3M]) (28) 
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L 

Hence, once the first spin is excluded from the chain the field acting on the second spin changes from ^ hiy'2 

i=\ 

L 

to Xi hijj In the zero temperature limit (T — > ; j3 — > °o) the functions A(a) and B{u) reduce to the 

i=l 

following form, 

A{u) = in»{J -\u\)+J^{u-J)-J^{-u-J) (29) 

B{u) = J-&{J - \u\) + in}{u -J)- w&i-u - J) (30) 

where =0 for {x < 0) and t?(x) = 1 for {x > 0). Thus after excluding one spin every time and repeating 
the above steps, the partition function for the system is obtained as, 

/J E m) 

Z{l)=e (31) 
where is obtained from the recursion relation 

^k = t,hiy[+A{^k~i).k=\,2,...,N, ^o=0 (32) 

Following similar steps with Hq from ( fT3T i we get the recursion relation for the Gaussian realization of noise 
in the observation channel as, 

U = t,\y'k+A{^k-i).k=\a,...,N, ^0=0 (33) 
i=\ 

j^'s from the above random recursion relation are random quantities governed by the probability p(y). As we 
can see from the above equations for binary realizations, can take a finite number of values and £,1^ can take 
an infinite number of values. But in the asymptotic j5 ^ °° limit, A{u) and B{u) reduce to the simple form 
given in ( 129b and ( 130b respectively. Thus we obtain a finite number of values for (the number can be large 
or small). However for the Gaussian distribution, we get a continuous set of values for ^f^. 

The parametric form of is already explained in for a single observation channel. Here we provide the 
generalization to two observation channels, i.e. L = 2, which can be easily extended to the multiple channel 
case. We can parameterize as, (^j.(ni,«2,«3) = {nihi +«2fe +«3-/) = + a«2]/Ji +"3-/), where hj/hi — 
a, «i and n2 can be positive and negative integers while «3 can take only three values 0,±1. It may be noted 
that the states ^ («i ,n2j0) are not recurrent: once takes a value with + a«2) positive or negative,n3 = 
±1] it never comes back to the state ^ («i ,n2,0). Thus in the limit >> 1 we can ignore the states ^ («i,«2,0). 

Let us now consider the problem of finding the stationary distribution of the random process given by ( 132b . 
Note that the process 3^ with probabilities p(y) is not Markovian, hence the process in ( 132b is not Markovian 
either, which slightly complicates the calculation of its stationary distribution. Towards the latter goal, let 
us consider an auxiliary Markov process iF, which has identical statistical characteristics with the process 
For this auxiliary Markov process 3f the realization is denoted as z, so as not to mix with the original 
process x. We need to include ^ to make the composite process Markov. Thus, to make the realizations for 
the composite process ,y' , . . . ,y^] Markov, we enlarge it to [1^ ,y' , . . . ,y^,z] (lets call it 'rf). The conditional 
probability for is given by, 

(»(^y,...,/,^lry^...y^^o=M^k09(^l^^.v^•■■y)^^'(yk) (34) 

i=l 

For p{z\z') and n{y' \z) refer to the Markov process ^ and the noise respectively, while \^',y\... ,y^) 
takes only two values and 1, depending on whether the transition is allowed by the recursion ( 132b . Now, we 
need to determine (p{^ , ■ ■ ■ ,y^) after finding all possible values of ^i^. Let us first relate the stationary 
probabilities (w(| ,_y', . . . ,y^,z) of the composite Markov process 'rf to the characteristics of MAP estimation: 
ft)(^ ,y' , . . . ,y^,z), conveys the message about the stationary probabilities ). After this using the definition 
for the partition function from ( 1311) and free energy ( 1181) for the composite Markov process which is ergodic, 
the free energy is given as |[9l, 

-fiJ,h) = ~F{J,h)/N = Y,(0{^)Bi^) (35) 
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where the summation is taken over all possible values of ^ [for a given {J,h)]. We study different processes 
in the multi-channel scenario, generally speaking, for L > 2 with same noise in the channels and for L = 2 
with varying noise in the two channels. For all these cases /t,'s can be represented by a single quantity h. Once 
we find f{J,h) we can apply (I20b.( l21b to do further calculations. To calculate the entropy we first derive a 
convenient expression for free energy, which can be obtained from ( 1301 ), ( 131b . 

^(y) = - 7 I I log[2cosh{j3(^,+.7)}] (36) 

Using this we find, 

dTF{y)\T^o = i^^r |r f ±/)| Ir^o (37) 
where 5 ( . ) is the Kronecker delta function. In the > > 1 limit, under the assumption that the Markov process 

N 

is ergodic, jj £ 5 (^j: ± /) should with probability one (for the elements of the typical set 12 (^^)) converge to 

k=l 

co{E, =J) + (0{£, = -J). We thus obtain |I9|, 

e^0/N=^^[o){J) + O){-J)] (38) 

The above formula for an Ising spin model would imply that the zero temperature entropy can be extensive 
only when the external field ^ acting on the spins would have the same energy i^xj = ±1 as the spin-spin 
coupling constant J. This would imply that a macroscopic amount of spin (from any of the models studied) is 
frustrated, i.e., the factors influencing those spins compensate each other. When the entropy is not zero, there 
are many sequences whose probability may still slightly differ from one another. The MAP characteristics 
(c, v) would refer to the averages over all those equivalent sequences. We will discuss this effect explicitly for 
the various cases analyzed. 

Finally, we discuss the semi-analytical error analysis that has been developed in this work. For this let us 
consider the overlap defined in (122b . Recall that the error estimate is given by — f?^/g|i,^o./3^~ where fg is 
defined in (1241) . Derivation is shown explicitly for a single observation channel in Appendix 18. II The results 
are used to analytically solve for the error in single and two observation channels and is provided in Section 
16.11 The formalism can easily be generalized for multiple observation channels with some tedious algebra. In 
Appendix l8.1l we show that the overlap is be given as, 

4. = Ip(y,s).,[/(^,) +8{^k)m+i) + . . .+8i^k) ■ ■ .g{^N-i)f{^N)] (39) 
y.s 

In the limit — > Aj^ is independent of k. The average error is related to the overlap as, PMAp{srror} = 

l-A 
2 ■ 



5 Characterization of the recurrent states 

For calculating the quantities of interest we need to obtain the recurrent states for the different multi-channel 
systems. The recurrent states are found out and parameterized in a manner similar to that demonstrated for 
L = 1 in 161. In this section we analyze two scenarios. In Scenario I, we consider a 2 observation channel 
system. The noise in either channel is varied in a certain fixed proportion. In Scenario II, we consider the 
general L > 2 channel system, each having the same noise intensity. Below we give the parameterizations 
of the recurrent states for these two cases separately. In both the scenarios we come across multiple phase 
transitions which we denote by m and the noise within any of those phases varies as ^ <h < -^z^- The phase 
transitions are reflected in the study of the order parameters and the error analysis and is discussed in detail 
in the next section. 
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5.1 Scenario I: Two channels with different noise intensities 

The parameter a = h2/h\ is defined as the ratio between /!,'s ^ in the individual observation channels. The 
stationary states can be parameterized as, [a, a,] which are written in terms of h (depicting the channel with 
higher noise), 

ai = J + {2-i)h + r\h 

di = -/ - (2 - i)h +T]h (40) 

The quantity r\ is defined as. 



77 = {a + \)h 
= {a-\)h 
= -{a-l)h 

= -{a + l)h (41) 

The values of a = 112/ hi are kept as positive integers since for an arbitrary a (rational or irrational) 
the number of stationary states increases very fast making the analysis complicated, but doesn't contribute 
towards any additional generality of the problem. The stationary states at all the phases can be found form the 
above formula by substituting the values of from Table [T| in the above equation. The total number of states 
during different phases is tabulated in Table |2l 



Table 1 The possible values of / for different a at different phases denoted by m. In the table below (. . .) implies for (i) a : even, 
all integer values on the range and (ii) a : odd, only the even values in the range. 



m 


a = 1 


a = 


2 


a = 3 


a = 4 


a = 


5 


a = 6 


a = l 


a = 8 


a = 9 


a = 


1 


2 


2 




2 


2 


2 




2 


2 


2 


2 


2 


2 


2 


2,3 




2 


2 


2 




2 


2 


2 


2 


2 


3 


2,4 


2,.. 


,4 


2,4 


2 


2 




2 


2 


2 


2 


2 


4 


2,4 


2,.. 


,5 


2,4 


2,5 


2 




2 


2 


2 


2 


2 


5 


2,4,6 


2,.. 


,6 


2,4,6 


2,5 


2,6 




2 


2 


2 


2 


2 


6 


2,4,6 


2,.. 


,7 


2,4,6 


2,4,5,7 


2,6 




2,7 


2 


2 


2 


2 


7 


2,..., 8 


2,.. 


,8 


2,..., 8 


2,..., 8 


2,.. 


,8 


2,7 


2,8 


2 


2 


2 


8 


2,..., 8 


2,.. 


,9 


2,..., 8 


2,..., 9 


2,.. 


,8 


2,4,7,9 


2,8 


2,9 


2 


2 


9 


2,..., 10 


2,.. 


,10 


2,..., 10 


2,..., 10 


2,.. 


,10 


2,4,7,9 


2,4,6,8 


2,9 


2,10 


2 


10 


2,..., 10 


2,.. 


,11 


2,..., 10 


2,.. .,11 


2,.. 


,10 


2,4,6,7,9,11 


2,4,6,8 


2,4,9,11 


2,10 


2,11 
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Table 2 Number of states for different a at different phases denoted by m 



a=\ a = 2 a = 3 a=4 a = 5 a = 6 a = 7 a = 8 a = 9 a=10 



1 

2 


00 00 


8 

16 


C30 00 


00 00 


00 00 


OC 00 


8 
8 


00 00 


00 00 


00 00 


3 
4 


16 
16 


24 
32 


16 
16 


8 

16 


8 
8 


OC CO 


8 
8 


8 
8 


8 
8 


8 
8 


5 


24 


40 


24 


16 


16 


8 


8 


8 


8 


8 


6 


24 


48 


24 


32 


16 


16 


8 


8 


8 


8 


7 


32 


56 


32 


56 


32 


16 


16 


8 


8 


8 


8 


32 


64 


32 


64 


32 


32 


16 


16 


8 


8 


9 


40 


72 


40 


72 


40 


32 


32 


16 


16 


8 


10 


40 


80 


40 


80 


40 


48 


32 


32 


16 


16 
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5.2 Scenario II: L— identical channels 

For a multiple observation channel system we can parameterize the stationary states for a given value of J 
and h within any phase as, 

L 

bi = J+{2-i)h + Y^tih 

i=l 

L 

bi = ~J~{2~i)h + Y,tih (42) 
1=1 

with f, G {+1,-1}. The possible values of i for different phases are given in Table [3j For the various 
possible phases, the total number of states are given in Table|4] Here one can notice the significant rise in the 
total number of states for large L as we go to a higher phase. 



Table 3 The possible values of / for multiple observation channel systems with same h at different phases denoted by m. In the 
table below {. . .) implies for (i) a : odd, all integer values on the range and (ii) a : even, only the even values in the range. 





m 


L :: even 


L:: 


odd 


1 


2 


2 




2 


2 


2,3 




3 


2,4 


2,. 


.,4 


4 


2,4 


2,. 


.,5 


5 


2,4,6 


2,. 


.,6 


6 


2,4,6 


2,. 


■ J 


7 


2,..., 8 


2,. 


.,8 


8 


2,..., 8 


2,. 


■ ,9 


9 


2,..., 10 


2,. 


.,10 


10 


2,. ..,10 


2,. 


.,11 



Table 4 Number of states for multi observation channels with same h at different phases denoted by m 



m 


L= 1 


L = 2 


L = 3 


L = 4 


L = 5 


L = 6 


L = l 


L = 8 


1 


4 


8 


16 


32 


64 


128 


256 


512 


2 


8 


8 


32 


32 


128 


128 


512 


512 


3 


12 


16 


48 


64 


192 


256 


768 


1024 


4 


16 


16 


64 


64 


256 


256 


1024 


1024 


5 


20 


24 


80 


96 


320 


384 


1280 


1536 


6 


24 


24 


96 


96 


384 


384 


1536 


1536 


7 


28 


32 


112 


128 


448 


512 


1792 


2048 


8 


32 


32 


128 


128 


512 


512 


2048 


2048 


9 


36 


40 


144 


160 


576 


640 


2304 


2560 


10 


40 


40 


160 


160 


640 


640 


2560 


2560 



6 Results 

In this section we describe our results where we try to analyze different multiple observation channel systems 
by evaluating the order parameters and entropy. First, we consider a 2 observation channel system by varying 
the noise in the individual channels. Next, we consider L observation channels with identical noise intensi- 
ties. Finally, we compare the performance of a single relatively "clean" channel with multiple "noisy" ones 
by bringing in the notion of channel cost, and examine the tradeoffs between the two setups. An attempt is 
made to provide a physical intuition behind the results that we obtained using simple thermodynamic prin- 
ciples ll9l lI2|[T3]|I4III5l . The system performance is considered for both maximum likelihood (ML) as well 
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0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 



(a) c (b) V (c) VI 

Fig. 3 (Color online) Order parameters (a) c and (b) v obtained analytically by Ising Hamiltonian minimization (bold line) which 
shows an exact superposition with the data obtained from the Viterbi algorithm (open squares) for two channels with different 
noise for q = 0.24. The smooth decaying lines in the plot for c are obtained via ML estimate and the horizontal blue line depicts 
co.(c)v'i is the overlap between the MAP and ML estimated sequences. 

as maximum a posteriori (MAP) estimation. In ML, the estimation of the current state is dependent only on 
the observations from the different channels at that particular instant. In case of a tie, which arises when the 
number of observation channels is even, the state is chosen randomly to be either or 1 with equal probability. 
For example, suppose the number of observation channels is L (L being even) with L/2 of the channels having 
an observation 1 and the remaining L/2 an observation of 0. In this case, the state is chosen to be either or 
I at random with probability 1/2. 



6.1 Two-Channel scenario 

In this subsection, we study the performance of a two observation channel system with relatively different 
noise levels. 

Channel 1 has noise e\ and channel 2 has noise Ej. The parameter a defines the relation between the noise 
levels in the two channels. Specifically, 



1 




Since we always take a > 10 the noise level in channel 2 is less than that in channel 1, implying channel 1 
is "noisier" than channel 2. The order parameters c and v, plotted by varying the noise ejj in channel 1 for 
a fixed spin-spin correlation / (a function of q as given by ([Hi), are shown in Figure [3j We observe different 
operational regimes, that are separated by first order phase transitions. The point of the first phase transition 
gradually moves to the right with an increase in a. This indicates that the overall behavior is dominated by 
that of the cleaner channel, which is intuitive. In this region before the first phase transition, the ML and 
MAP estimates coincide. The sequence correlation parameter c is pretty stable and noise independent before 
the point of first phase transition for MAP estimation. Afterwards the correlation shows discrete jumps and 
goes to the prior dominated value of 1 as observed from MAP estimation, whereas with the ML estimate, c 
monotonically reduces to 0. The advantage of MAP estimation over ML is the fact that in MAP the estimation 
is supported by prior and hence, performs better at intermediate noise ranges. For example, when a = 2, it 
can be seen that the correlation c degrades faster for ML estimation and is nearly constant for a large range 
of noise values for MAP estimation. The overlap v is found to gradually shift towards 1 before the first phase 
transition with increase in the value of a, implying that the estimated sequence for all the possible noises is 
primarily driven by the observations. Before first phase transition, we encounter the observation dominated 
regime except for a = I. The value of the overlap is not 1 because of the manner the overlap is defined in (121b . 
A more plausible way to see the observation dominated regime is to consider the overlap between MAP and 
ML estimated sequences, which is plotted in Figure |3(c)[ This confirms the fact that there is no observation 
dominated regime for a = 1 in 2-channel systems. After the first phase transition the overlap becomes worse 
and at higher noise, v decays towards 0. Increasing a enlarges the observation dominated regime, or in other 

^ The case with a < 1 is obtained by simply interchanging the channels. 
* The plots are made relative to the "noisier" observation channel. 
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Fig. 4 (Color online) The entropy of the system obtained analytically with varying e (e denoting the higher noise level) for two 
observation channels at different noise ratios given by a at g = 0.24. 
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Fig. 5 (Color online) Region of zero (blue) and non-zero (white) entropy shown in the ei versus £2 (noise value in channel 1 
and 2 respectively) plot for different values of parameter q 



words, the region of ML estimation. It is also interesting to note that the correlation and overlap parameter 
are stable for a longer range with an increase in a, but show a rapid rise or decay respectively after the first 
phase transition. The analytical results are verified by simulations using Viterbi algorithm which are in good 
agreement with each other. 

Let us now focus on entropy. We see that the addition of a second observation channel with the same 
noise as the first results in non-zero entropy for all possible values of e. With the introduction of the second 
observation channel, the noise parameter 1x2 being of same strength as hi mutually cancel each other. This 
gives rise to multiple degenerate ground states for the acting external field and hence multiple solutions are 
obtained from MAP estimation, resulting in non-zero entropy 121 . However as we increase a, it can be seen 
from Figure |4] that the regions of zero entropy is obtained again and the point of first phase transition shifts 
to the right. However, after the first phase transition, the entropy rises, attains a maxima and then decays. The 
reason behind this is discussed in detail below where we discuss the regions of zero and non-zero entropies 
obtained for parameter q. 

Knowing the region of zero and non-zero entropy is of particular interest for the two observation channel 
system. We derive this region for the possible values of noise, e in the two observation channels for parameter 
q and this is plotted in Figure[5j We can qualitatively see that for the region corresponding to £1 = £2 line which 
defines a = 1, we can never obtain a unique sequence from MAP estimation. This is due to the prevalence of 
degenerate ground state solutions obtained at zero temperature due to the mutual nullification of the opposing 
field in the two observation channels. As we perturb the external random field applied in one of the observation 
channels by a certain amount (depending on the value of q), we migrate to the region of zero entropy. The 
region above the £1 = £2 line corresponds to a > 1, and at the zone boundary between the zero and non-zero 
entropy, we have hi + 2J = h2. Denoting by £1^ and £2;,, the values of the corresponding £1 and £2 at the zone 
boundarjO, we have 



1 - £lb £lb 



(43) 



^ The zone boundary for the region below £1 = £2 line can be obtained by interchanging £u, and £2b in <43t 
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Before h2 hits the point of first phase transition, the prior 27 and the noisier channel field strength h\ 
combine to cancel out the effect of clean channel h2 and the spins in certain clusters of the Ising chain will be 
frustrated which explains the reason why we have non-zero entropy at zero-temperature. Once /t2 crosses the 
zone boundary then the mutual cancelation is inefficient and we obtain the zero entropy regions. Thus, only 
for certain [hi, 112] we can attain the zero entropy condition in the thermodynamic limit. 




(a) Error with same noise (b) Error with Varying a 

Fig. 6 (Color online) Plot of error from (a) MAP estimates obtained analytically for 1— and 2— observation channel systems 
with same noise in each of them (analytical and simulated) and in (b) we have the plots using ML (blue) and MAP (red) estimates 
obtained by varying a in the two channels (simulated) for q = 0.24 

We will now find the error between the actual and estimated sequences. In Figure [6(a)] the error plots are 
made for (a) a single observation channel and (b) two observation channels with a = 1. The estimates are 
obtained semi-analytically using MAP technique (see ( 1391 ), ( 1541 )). The details of the derivation with further 
analysis is provided in Appendix l8.ll 

In Figure |6(b)| the ML and MAP error estimates are plotted using the Viterbi algorithm. Here we have 
studied the system at higher values of a (the semi-analytical treatment described in Appendix 18.11 can be 
extended to study the system having high a with tedious calculations). We notice from ML as well as MAP 
estimation (which in this case is close to ML estimate but the performance is heavily dependent on q) that 
addition of a cleaner channel (i.e., increasing a) results in a reduction of the error. For ML, the result is mainly 
driven by the cleaner charmel. The error due to ML estimation is simply the probability of error in the cleaner 
channel, and is given as 

PML{erwr} = Y^^Trjy^ (44) 

For higher a, the error is small for low and intermediate noise values but at higher noise values, the error 
increases rapidly and becomes comparable to error at lower a. We also notice that MAP estimates have lower 
error relative to ML at smaller a for the particular value of q studied. However, as we increase a, for example 
a = 8, the ML and MAP estimated error becomes indistinguishable for any value of e. 

6.2 L-channels with identical noise 

In this subsection, we study a system of L observation channels with identical noise, for a fixed value of 
positive correlation coefficient J. The correlation c found from the MAP estimate matches exactly with the 
ML estimate for regions with small noise as can be seen from Figure |7(a)| However for intermediate values 
of noise we see that c from MAP estimation is more stable for all values of L whereas its value obtained from 
ML estimate shows a monotonic decrease with increase in the noise value 0. 

Interestingly, for higher L, c is stable even before the first phase transition at lower noise regimes. This 
can be seen by comparing the value of c with the reference value cq of the Markov process ^ . cq and c are 
given below. 



^ The ML estimate for L = 1 and L = 2 coincides as can be easily seen from <45t and ( l46t 
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(a) c (b) V (c) VI 

Fig. 7 (Color online) Order parameters (a) c and (b) v obtained analytically by Ising Hamiltonian minimization (bold line) which 
shows an exact superposition with the data obtained from the Viterbi algorithm (open squares) for L channel systems with same 
noise for q = 0.24. The smooth decaying lines in the plot for c are obtained via ML estimate and the horizontal blue line depicts 
cq. (c)vi is the overlap between the MAP and ML estimated sequences. 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 



E £ 

(a) odd L (b) even L 

Fig. 8 (Color online) The entropy of (a) odd and (b) even channel systems obtained analytically with varying e (e is same in all 
the channels of the L channel system) a.iq = 0.24. 



Co = ^ x\X2Pst{xi)p{x2\xi) = \-2q 

CML = {\~2q){\~ 2PML{error}\Lf (45) 

PML{£rror}\i provides the error estimate for L even or odd and is given in ( I46I) . With increase in noise 
the correlation shows jumps in its value. These jumps become smaller and appear at closer intervals with 
increasing noise intensity and c saturates to the prior dominated value of 1 . The overlap between the observed 
and estimated sequence is plotted in Figure [7(b)| Similar to what was done in the case of 2-channel systems, 
the overlap between the MAP and ML estimated sequences is plotted in Figure |7(c)| This gives a more clear 
indication of the observation dominated regimes for different L. With an increase in noise there is a gradual 
drop in overlap with discrete jumps at the points of phase transition, tending towards at high noise. However, 
on adding more channels to the system we find that the overlap shows a gradual monotonic decay. All the 
above analytical calculations are supported by simulations obtained by running the Viterbi algorithm and are 
plotted along with the analytical data for comparison. 

We now focus on the entropy of the system, defined as the natural logarithm of the number of MAP 
solutions that we can possibly obtain. The entropy is plotted in Figure [8] 

- When the number of channels in the system is odd, then for small values of noise (in the ML dominated 
regime) there is a unique solution to the MAP estimation problem. When varying the noise, the system 
undergoes first-order phase transitions at the points given by /t = ^. In particular, the entropy becomes 
non-zero at the first phase transition, h = 2J, signaling an exponentially many solutions to the MAP 
estimation problem. At each phase transition point we see that there are discrete jumps in entropy. The 
magnitude of those jumps at the points of phase transitions diminishes with the increase in L. However, 
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Cost plot at q=0.24 
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(a) Error with Varying L 

Fig. 9 (Color online) Plot of eiTor with ML (smooth) and MAP (wiggled) estimates obtained by varying number of channels for 
q = 0.24. 

the position of phase transitions is independent of the number of channels. Mapping the system to an Ising 
spin model, we can see that there are L + 1 effective forces acting on a spin at a particular instant, the L 
magnetic fields due to the observation channels and one due to the spin-spin interaction. At the point of 
first phase transition, the prior 2/ (quantifying the spin-spin interaction) nullifies the effect of one of the 
channels, whereas the magnetic field from the remaining even number of channels compensate each other 
due to mutually conflicting observations. 
- For even L, there are no regions of zero entropy and thus for all possible values of L, there are exponentially 
many MAP solutions corresponding to a typical observation sequence. This is due to the fact that for an 
even number of channels with the same noise, the magnetic fields acting on the spin mutually compensate 
each other, resulting in macroscopic frustration. Due to this, for any value of noise we have non-zero 
entropy Q. The rise is again found to be more gradual for large even L values. The points of phase 
transition when L is even is given by, h= Thus for these systems the number of phase transition points 
is reduced by half relative to odd L. 

Finally, in Figure |9] we provide the accuracy of estimation in the multi observation channel by MAP 
estimation using Viterbi algorithm. The same is also calculated using ML estimate (plotted for comparison) 
and is given by the following formulaQ 

L-l 

PMaerror}|i=odd = 1 - £ Q e'(l " e)^"' 

PML{error}|L=even = l- £ Qe*^(l-e)^-'-^(^^^e^(l-e)^ (46) 
The error in estimation is found to improve with the addition of more observation channels. 



6.3 One "clean" vs multiple noisy channels 

In this section, we bring in the notion of channel cost and use it to compare the performance of a single channel 
system with a multi-channel system while keeping the cost same. Channel cost can be interpreted as a function 
of the channel noise e. In many scenarios, it is more expensive for a system designer to build a channel 
with small noise than one with higher noise. For example, in realistic channels for data communication. 




^ The ML estimate for L = 1 and L = 2 coincides as can be easily seen from the estimation error provided in <46t for L odd 
and even 
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thermal noise at the receivers is a major source of channel noise and building a receiver with low thermal 
noise is expensive. The binary symmetric channels considered here can be used to model a power plant 
producing equipments and the noise can be interpreted as the probability of making a defective equipment. 
In order to make the equipments less defective, a system engineer would need expensive machines and good 
maintenance, which increases the cost. Because of this inverse relation between channel cost and channel 
error, for simplicity, we model the channel cost as being inversely proportional to the channel error and 
linearly proportional to the number of channels in the system. Thus, for a L-channel system, the channel cost 
is given by log{^) 0. 




Figures |10(a)pO(c)] give a comparison of cost vs. performance for various multi-channel systems with 
varying q. The cost is plotted as log (^) while the performance is measured in terms of error. A point on the 
curve with a specific error value, say E, indicates the channel cost incurred in order to build a system that 
can tolerate a maximum error of E. It is easy to note that if the channel cost was inversely proportional to 
e alone, then increasing L would have always resulted in a better system to tolerate a certain value of error. 
However, because of the linear dependence on the number of channels L, we see a certain value of E, say £th 
after which the the cost required to build an L channel system becomes higher than that required for a single 
channel system in order to attain the same level of performance in terms of error requirements. 

In line with our argument, we observe from the plots that in order to build a system with small error, it 
is advisable to increase L in order to minimize the channel cost. However, if we can tolerate a larger error, it 
would be cheaper to go for a single channel system. For example, when the error is small, the cost required to 
build a 8 channel system is less than that required to build a single or a 2-channel system. This is true for all 
values of q. However, when the error is large, the single channel system is much more preferable, since the 
cost required is much lesser than that needed to build a system with L > 1 . 



6.4 Single observation channel with gaussian noise 

A detailed study of the single channel with a Gaussian observation noise model is relegated to Appendix [821 
Here, we only provide the results for the order parameters c and v which are plotted in Figure [TT] We also 
show vi , which is the overlap between the ML and MAP estimated sequences. The plots are continuous and no 
first order phase transitions are observed compared with the discrete noise model considered earlier. We also 
find that the entropy is zero for this case (see Appendix l8.2l for details), meaning we do not have exponentially 
many solutions for a given observation sequence. An important point to note is that in the discrete case, a zero 
entropy signified an observation dominated regime, where the ML and MAP estimates coincided. However, 
in the gaussian case, we have zero entropy at all values of a, but this does not mean that the ML and MAP 
estimates coincide everywhere. This is justified by seeing vi, which monotonically goes to zero, indicating 
that the correlation between the ML and MAP estimates reduces with an increase in the variance a^. 



Here we take the log of | to define the cost because for small error | is exponentially large 
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2 4 6 8 10 

a 

Fig. 11 (Color online) Order parameters c (in blue) and v (in red) plotted from the analytically obtained data from Ising hamil- 
tonian minimization (bold line) and from the Viterbi algorithm (open circles) for the case of single observation channel with 
Gaussian noise for q = 0.24. v\ is the overlap between the MAP and ML estimated sequences. 



7 Conclusion 

In this paper we have presented an analytical study of Maximum a Posteriori (MAP) estimation for Multi- 
channel hidden Markov processes. We have considered a broad class of systems with odd and even number 
of channels (having the same noise intensities) to understand the MAP characteristics in analogy with the 
thermodynamic quantities. In all the system models studied here, we observe a sequence of first-order phase 
transitions in the performance characteristics of MAP estimation as one varies the noise intensity. Remarkably, 
the position of the first phase transition depends only on whether L is odd or even, but not on its value. 

In the systems with odd number of channels, there is a low noise region where the MAP estimation prob- 
lem has a a uniquely defined solution, as characterized by vanishing zero-temperature entropy of the corre- 
sponding statistical physics system. At a finite value of noise, the system experiences a first order phase transi- 
tion and the number of solutions with posterior probability close to the optimal one increases exponentially. In 
contrast, in systems with even number of channels, the MAP estimation always yields an exponentially large 
number of solutions, at all noise intensities. This is explained by drawing an analogy with the thermodynamic 
system where the spins experience macroscopic frustration due to contradicting observations from different 
channels. Our results indicate that for a system with L = 2 observation channels, one can recover the region 
of zero entropy by introducing noise-asymmetry in the channels. In addition to the binary symmetric channel 
we have also considered the Gaussian observation channel, and demonstrated that the corresponding system 
has a vanishing zero-temperature entropy, indicating a unique MAP solution. 

Finally, we analyze the tradeoff between system cost and estimation error for L— channel systems, by 
assuming an inversely proportional relationship between channel cost and channel noise. Our results suggest 
that if the objective is to achieve low estimation error, then it is more advantageous to build a system with 
larger number of noisier channels, rather than having a single channel with better noise-tolerance. However 
if we can tolerate higher error, it is more beneficial to use a single channel system. An exception is noticed 
for the 2-channel system whose performance relative to a single channel system is dependent on the spin- 
spin interaction (J). For moderate J the estimation error for both the single and two channel systems are 
comparable while at lower J the single channel system is found to perform better to achieve any degree of 
accuracy. 

There are several directions for extending the work presented here. For instance, it will be interesting 
to generalize the analysis presented here beyond the binary HMMs, e.g., by reducing the problem to a gen- 
eralized Potts model. Note that the critical behavior observed here is due to two competing tendencies, (a) 
accommodating observations and (b) hidden (Markovian) dynamical model. Thus, it is natural to assume that 
similar behavior can be expected in non-binary systems as well. Another interesting problem is to "break" the 
macroscopic degeneracy of the MAP solution space by adding additional constraints and/or objectives. For 
instance, among all the MAP solutions, one might wish to select the one that has the highest overlap with the 
typical realization of the hidden process, which might be useful in the context of parameter learning 1161 . 
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8 Appendix 

8.1 Analytical error calculation using MAP 

For finding the error expression for a single observation channel^ using MAP analytically we use a modified 
Hamiltonian ( 1231) . To calculate the error estimate we need to find — ^g/gl^^o./S-s^oo where fg represents the free 
energy of the modified Hamiltonian and is given by, 

1 ^ , , [pi 
= -^Lviy^^)t^iik) (47) 

with = hyii +gSk +A{^k-i)- Note that the limits are taken in order, i.e., first we take the limit ^ — and 
then j3 — oo. Using the recursion relation from ( 1271 ) and ( 1281 ) we get d^yfg as, 

= -^Ip(y,s)i f {tanh(i3(^, +7)) +tanh(/3(^, -/))} {s, + dgA{^,.,)} (48) 

" y,s ^ k=l 

with the term 5gA((^^_i) given by, 

dgA{^k-i) = f {tanh(j3(^,.i +/)) +tanh(j3(^,_i - J))} {s^-i + dgA{^k-2)} (49) 

Taking the limits and simplifying, we can write the terms in the above expression as, 

tanh(j3(^, +7)) +tanh(/3(^, -/)) = -25 {^^ < -J)+25{^k > J) 
tanh(j3(^,_i +7)) - tanh(/3(^n -/)) = 25(-7 < < J) (50) 

Using this we can write, 

- <3g/glg^o,;3^. = ^ Ip(y,s) I [-5(1* > J) + 5(1* < -msk + 5{-J< < 7)[.*_i + ...]...] 
' y,s k=l 

= i Ip(y^«) I [fmh+g{^k-i)h-i +g{^k-2)W-2 ...]...] 

y,s k=l 
" y,s k=l 

= ^ I (^Ip(y>s)^a/(&) +g(^*)/(^A.+i) + . . ■+8{^k) . ■■8{^N-i)MN)?j (51) 

where f{^k) = ^i^k > J) ^ ^i^k < -J) and g{^k) = 5{-J < E,^ < J). We outline the process of evaluating a 
particular term of the inner series, say, the term 

£p(y,s)5^g(^i) . ..g{^k+n-l)f{^k+n) 



' The approach can be extended to multi channel system with tedious algebra 
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y-s 

L P{y,s)Skg{^k)---8{^k+n-\)f(.^k+n) 
yi,...,y^;si.,..,si, 

yk+ 1 1 ■ ■ ■ ,.y*+n 1 ■ • >*/t+n 

= 12 P{^k,Sk,yk+U--- ,yk+,uSk+l , ■ ■ ■ , ■ • ■ gi^k+n-l)f{^k+n) 

^k+1 f-,^k+n 

= L P{^k+l,---,Sk+n,yk+\,---,yk+n\^k,Sk)p{^k,Sk)Skg{^k)---g{^k+n-l)f{^k+n) (52) 

ik,Sk\^k+l<---,^k+n 
SA-+lvA+n 

where in (152b we replace yk's in the sum with ^j^'s since each sequence of va's corresponding to a sequence of 
^k's (for two channel system there will be multiple y^'s corresponding to a ^k and this is treated accordingly) 
and 

Pi^k,sk) =Y,(oi^k,Sk,yk) (53) 

yk 

It is easy to see that the term 

^k = Ip(y,s)..[/(^.) +g{^k)m+i) + . ■ .+g{^k) . ■ ■g{^N-i)f{M 
y>s 

which is independent of k for N ^ °°. For our calculation we approximate the error by keeping only the first 
four terms in the expression for Ak, 

l^p{y,s)sk[m)+g{^k)m+i)+giM^k+i)g{^k+2)m+3)+g{^^^^^ 

y,s 

In the thermodynamic limit, we drop the subscript k which gives 

-'^gfg\g^o,l}^^ = ^ 
Hence, A is the overlap between Sk and yk, and estimation error is calculated as. 



PMAp{error} = (54) 

Here we present results showing the correspondence of the simulated and the analytical data obtained in 
the ranges of first three phase transitions, h > 2/,2/ > h> J and / > /i > ^. In Figure |6] we have already 
used this semi-analytical expression to estimate the MAP error for 1 and 2 channel systems at a particular 
value of q. In Figure [12] we plot the error calculated for a single observation channel at different values of q. 
On truncation of the infinite series ( 1391 ) at the fourth term we see an exact match between the analytical and 
the simulated data for larger values of q. However we can see that that around the third phase for ^ = 0.1, 
the match is not exact. This implies that as we go to smaller values of q, for finding an exact match with the 
simulated data, we need to evaluate higher order terms of the infinite series. 



21 




0.05 0.1 0.15 0.2 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 



(a) ^ = 0.1 (b) 5 = 0.24 (c)q = OA 

Fig. 12 (Color online) Estimation error plotted from MAP estimate using simulation (red) and analytical formula (blue) for a 
single observation channel 



8.2 Gaussian observation model 



Now we consider a single Gaussian observation channel. The Ising Hamiltonian for the case of Gaussian 
observation channel is given by, 

Af-l N 

^^G(y,x) = - ^ JxkXk+i - ^ hxkyk (55) 
k=l k=l 

where h = A, being the variance of Gaussian noise. The noise distribution in the observation channel is 
given by, 

1 ta--'i)' 

^(yk\xk) = /— e 2a^ (56) 
V2na 

Using the above equation with p(y|x) = n^=i ^iyk\xk) we get after discarding the irrelevant additive factors, 

A' J 

logp(y|x) = £ --2Xkyk (57) 

k=l " 

The recursion relation is given as 

^k=hyk+A{^,_i) (58) 

where the generic form of the function ) is defined in ( |27] |. It is easy to see that —J<A{^) <J. Thus, ^ 
(which we refer to as our state) takes the form nJ + hy where n = {—1,0, 1}. Since the states having n = 
are non-recurrent (see the argument in Section|4]l, we can quantify our state space in the form ±J + hy, where 
y € (— °°, °°). Thus, our state space is now continuous, as opposed to being discrete in the case when the noise 
channel was binary. 

The conditional probabilities for the Gaussian distribution in the observation channel can be written as, 
(0{^,y,z\^',y',z) = p{z\z)7t{y\z)(p{±J + hy\±J + hy',y) (59) 
The exact expression for ( 1591 ) is tabulated below which is used to calculate the state transition probabilities. 



Table 5 Conditional probabilities for the Gaussian distribution of the observation channels where a(^,y,z\^' ,y' ,z') = 0){aJ + 
hy,y,z\cJ + hy' ,y' ,z') and the combinations of {a,z),{c,z') are tabulated. 



ia,z) l,{c,z')- 


> (1,1) 


(1,-1) 


(-1,1) 


(-1,-1) 


(1,1) 
(1,-1) 


£\i = l 




^%'=^ 




(-1,1) 




_2_r4| 

1-9^ = l 






(-1,-1) 
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The quantities used in the table are given as, 



1 



1 



1 



1 



{\-q)e "^^/(y>0)+ (l-g)e 2<'- / — — < < 



-27 



27 



(l-^)e ^l^/(y<0) + 



27ra 



(y-/-;')^ / , 27 
(l-^)e 0<y'<=^ 
Ina \ ■ h 



h2 



-27 



(60) 
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Fig. 13 (Color online) Plots for ps, for q = 0.24 




Fig. 14 (Color online) Plots for p,, for q = 0.24 



We provide an explanation to calculate the entries corresponding to the 1st row and the 1st and 3rd columns 
of Table|5] 

- = (1, 1, 1, 1): Qualitatively, this means that we were in a state ^' = J + hy' and z' = 1 and we 

moved to ^ = J + hy and z = 1 . From the conditional probability expression (|59l l, we need to compute 

ft)(^=7 + /ty,y,l|^'=7 + /V,y,l) =p(l|l)7r();|l)(p(7 + ft);|7 + /jy,);) 

= (l-^)7r(.v|l)(p(7 + /!3;|7 + /zy,y 
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Now, we can go from J + hy' to 7 + hy in two ways: 
- When y > (we always assume h > 0), A{J + hy') = J and ^ = A{S,') +hy = J -\-hy, so we should 
have y as the channel observation. Thus, we have 

7t{y\l)(p{J + hy\J + hy',y) = -^e-^ (61) 



- When ^ <y <0,A{J + hy')=J + hy' andE, = +'^3' = + V This means if we want to 
have ^ =J + hy, our observation should hsy~y' since this will give ^ = A{^') + hy = J + hy' + h{y — 
y') = J + hy. Thus, we have 

K{y\l)(p{J + hy\J + hV,y) = -^^e"'"' 2-^^" (62) 

V27ra 



Combining (|6T]i and ( |62] |. we get 

0}{^=J + hy,y,l\^' = J + hy',y',l)=p{l\l)7i{y\l)cp{J + hy\J + hy',y) (63) 

1 (v-l)^ , 

{l-q)e ~^I{y'>0) 



2na 

I iy-y'-i)^ / -2J 



(64) 

- (fl,z,c,z') = (1, 1, — 1, 1): Qualitatively, this means that we were in a state ^' = ~J + hy' and z' = 1 and 
we moved to = / + hy and z = 1 . Again, using the conditional probability expression (|59l l, we have 

(0{^=J + hy,y,\\^' = -J + hy\y\\)=p{\\\)n{y\\)(p{J + hy\-J + hy\y) 

= {\-q)n{y\\)(p{J + hy\-J+hy',y) 

Now, we can go from — / + hy' to 7 + hy in the following manner. When y > ^■, A{-J + hy') = J and 
^ = A{£,') + /jy = J + hy, so we should have y as the channel observation. Thus, we have 

n{y\l)(p{J + hy\J + hy',y) = -j^e~^ (65) 

V27ra 

We thus have from 

ft)(^ =/ + /2y,y,l|^' = -7 + /V,y,l) =p(l|l)7r(y|l)(p(7 + /ty|-/ + /jy,y) (66) 

1 0-1)- / 27\ 
= {l-q)-^e -l^liy'y — ] (67) 



2na \ h 

The other entries in Table [5]can be computed in a similar manner. The expressions for the stationary states 
can be obtained from that of the conditional probabilities as, 

jT^ (0{^ ,y,z\^',y',z')pA^',y',z)d^'dy'dz' = pA^ ,y,z) (68) 

Now let us define for convenience, 

©(^ =fl7 + /2y,y,z|^' = c7 + /2y,y,z') = Y{a,y,z\c,y' ,z') (69) 
Psi{^ =aJ + hy,y,z) = S{a,y,z) (70) 

With the above notations we can conveniently write the coupled integral expressions for finding the stationary 
probabilities as, 

S{a,y,z)= Z L f r{a,y,z\c,y',z')E{c,y',z')dy' (71) 
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It can be seen that there are four combinations of a and z, i.e., {{a,z) = (1,1),(1, — 1),( — 1,1), (—1,-1)}. 
Hence, we get four coupled integral equations as follows 

S(l,.y,l)= / ^^{l--q)e 1^ E{l,y' ,l)dy' + ^^^^{l-q)e E{l,y' ,l)dy' 

./v'=o \/2na ■Jy'=^ vino 



E{\^y'-l)dy' + j ^^^-^qe-^^^E{l,y'-\)dy' 



Jy'=Q s/lna ■>y=^^ 
+ r -^{\-q)e~'^Z{-\,y'A)dy' 



S(l,);,-1)=/ -^qe ^E{l,y',l)dy'+ „ ^5^^^ ^S(l,y,l)^/y 

/■°° 1 _(z±i)i , , 1 (y-y'+if 

+ -j^(l-q)e-^3{l,y'-l)dy'+ ^^{l ^ q)e-^^ S{l,y\ -l)dy' 
Jy'=o \/2%a •'y'=nr v^na 

r 1 (-'■+'.)^ 
+ / ^S(-l,y,lKv' 
■'y'=T w2na 

+ / ^^{\-q)e-^E{-\,y'-\)dy' 



s(-i,);,i)=/ ^^(i-^)e-^s(-i,y,i)t/y+ / ^^(i-^)e ^^s(-i,y,ivy 

.yy=-oo V27ra ./v'=o v27ra 

+ / ^^qe ^Ei~l,y',-l)dy'+ ^^qe ^ E{-\,y' -l)dy' 

+ /" ^=^(i-^)e"^s(i,y,ivy 

jy'=-^ y/2na 
+ / ^^qe-^E{iy-l)dy' 



s(-i,);,-i) = / ^^^e s(-i,y,ivy+ / s(-i,y,ivy 

.yy'=-oc v27ra iy=o V2na 

fO I (y+if , /"¥ 1 (y-y'+if 

+ ^^{l-q)e-^E{-l,y',-l)dy'+ - ^)e-^^S(-l,y, - 

Jyi=-^ \/2no Jy'=o \/27ta 

+ / ^^^e ^s(i,y,i)^/y 

/"T 1 (.''+1)^ 

+ / ^=^(i-9)e-^s(i,y,-i)^/y 



These equations are solved numerically to find Pst(,^ ^y-,z) using dTOl l. The probability density function (PDF) 
and the cumulative density function (CDF) are shown for two extreme values of sigma in Figures [13] and fT4l 
From this, it is easy to see that the distribution is continuous and the entropy which is given by (138b is 
found to be zero. 
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